Splunk® IT Service Intelligence

Event Analytics Manual

Configure the Rules Engine to handle indexer cluster rolling restarts and upgrades

Configure the Rules Engine to automatically adjust for indexer cluster rolling restarts so you can avoid duplicate processing of events and unexpected breaks in episodes. During an indexer cluster rolling restart, search results are incomplete and real-time searches are restarted every time a new indexer completes its restart process. The ITSI Rules Engine must run searches to rebuild its in-memory state every time it restarts. When those searches return incomplete or inconsistent results, it leads to duplicate event processing and unnecessary breaks in episodes.

Because the Rules Engine can't reliably detect the rolling restart or upgrade of the indexer cluster, you have to manually configure the cluster master and search head to allow the Rules Engine to query the cluster masters and get the status of the rolling restart or upgrade.

After you perform these setup steps once, the following events take place anytime you initiate an indexer rolling restart or upgrade, and right before every periodic backfill operation:

  1. When the restart or upgrade is initiated, the Rules Engine stops.
  2. Upon startup, the Rules Engine immediately queries the cluster masters to get the status of the rolling restart or upgrade.
  3. If the query returns "true", meaning a rolling restart is in progress, the Rules Engine restarts again and attempts the indexer cluster health status check.
  4. The Rules Engine continues to follow these steps indefinitely until the cluster masters returns a healthy status.
  5. Once the cluster masters return a healthy status, the Rules Engine proceeds with rebuilding its in-memory state.

If the cluster is not configured properly on the search head or if the credentials are missing or wrong on the search head, the Rules Engine treats the cluster as healthy and moves forward with the rolling restart or upgrade. Additionally, ensure you set up credentials properly to avoid cluster configuration errors.

Configure cluster masters and search heads

Perform the following steps to enable the Rules Engine to query the cluster masters and get the status of the indexer cluster rolling restart or upgrade.

Step 1: Configure the indexer cluster masters

In order for the Rules Engine to access the cluster master status REST endpoint, it needs an authenticated user with the correct authorization capability. Create a service account with at least the list_indexer_cluster capability.

  1. Create a new role called sa_user_cluster_status with the list_indexer_cluster capability.
  2. Create a new user and assign it the sa_user_cluster_status capability.
  3. Note the hostname of the cluster master, which you'll use as the realm parameter when adding the credentials on the search head.
  4. Repeat steps 1-3 on all cluster masters that have been added to the search heads as searchable clusters.

Step 2: Configure the search heads

The Rules Engine needs the plaintext password of the username to access the cluster master REST endpoint.

On one of the search heads, add the username, password, and realm of the service account to the search head password storage. You can add the information through the storage/passwords REST endpoint. For example:

curl -k -u admin:Chang3d! https://localhost:8089/servicesNS/nobody/SA-ITOA/storage/passwords -d name=<username> -d password=<password> -d realm=<cluster_master_hostname>

The realm is the host name of the cluster master. Make sure the realm matches the host part of the master_uri field returned from the services/cluster/config search head endpoint. For example, https://localhost:8089/services/cluster/config returns "master_uri": "https://master1:8089", so the realm is master1.

Limit the number of Rules Engine retries

By default, the Rules Engine indefinitely queries the cluster masters until they all return a healthy state. You can limit the number of retries if you want the Rules Engine to only attempt a specified number of status checks. After the specified number of attempts, the Rules Engine posts a message in Splunk Web and continues with the startup process of restoring active groups, backfilling events, and processing events.

Prerequisites

Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make changes to the files in the local directory.

Steps

  1. Open or create a local itsi_rules_engine.properties file at $SPLUNK_HOME/etc/apps/SA-ITOA/local.
  2. Add the following setting and specify the number of retries:
    max_cluster_rolling_restart_retry_count = <integer>
  3. Restart your Splunk software.
Last modified on 28 April, 2023
Troubleshoot the Rules Engine and event grouping in ITSI   Event Analytics Audit dashboard

This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.11.0, 4.11.1, 4.11.2, 4.11.3, 4.11.4, 4.11.5, 4.11.6, 4.12.0 Cloud only, 4.12.1 Cloud only, 4.12.2 Cloud only, 4.13.0, 4.13.1, 4.13.2, 4.13.3, 4.14.0 Cloud only, 4.14.1 Cloud only, 4.14.2 Cloud only, 4.15.0, 4.15.1, 4.15.2, 4.15.3, 4.16.0 Cloud only, 4.17.0, 4.17.1, 4.18.0, 4.18.1, 4.19.0, 4.19.1, 4.19.2


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters