Configure the Rules Engine to handle indexer cluster rolling restarts and upgrades
Configure the Rules Engine to automatically adjust for indexer cluster rolling restarts so you can avoid duplicate processing of events and unexpected breaks in episodes. During an indexer cluster rolling restart, search results are incomplete and real-time searches are restarted every time a new indexer completes its restart process. The ITSI Rules Engine must run searches to rebuild its in-memory state every time it restarts. When those searches return incomplete or inconsistent results, it leads to duplicate event processing and unnecessary breaks in episodes.
Because the Rules Engine can't reliably detect the rolling restart or upgrade of the indexer cluster, you have to manually configure the cluster master and search head to allow the Rules Engine to query the cluster masters and get the status of the rolling restart or upgrade.
After you perform these setup steps once, the following events take place anytime you initiate an indexer rolling restart or upgrade, and right before every periodic backfill operation:
- When the restart or upgrade is initiated, the Rules Engine stops.
- Upon startup, the Rules Engine immediately queries the cluster masters to get the status of the rolling restart or upgrade.
- If the query returns "true", meaning a rolling restart is in progress, the Rules Engine restarts again and attempts the indexer cluster health status check.
- The Rules Engine continues to follow these steps indefinitely until the cluster masters returns a healthy status.
- Once the cluster masters return a healthy status, the Rules Engine proceeds with rebuilding its in-memory state.
If the cluster is not configured properly on the search head or if the credentials are missing or wrong on the search head, the Rules Engine treats the cluster as healthy and moves forward with the rolling restart or upgrade. Additionally, ensure you set up credentials properly to avoid cluster configuration errors.
Configure cluster masters and search heads
Perform the following steps to enable the Rules Engine to query the cluster masters and get the status of the indexer cluster rolling restart or upgrade.
Step 1: Configure the indexer cluster masters
In order for the Rules Engine to access the cluster master status REST endpoint, it needs an authenticated user with the correct authorization capability. Create a service account with at least the list_indexer_cluster
capability.
- Create a new role called
sa_user_cluster_status
with thelist_indexer_cluster
capability. - Create a new user and assign it the
sa_user_cluster_status
capability. - Note the hostname of the cluster master, which you'll use as the realm parameter when adding the credentials on the search head.
- Repeat steps 1-3 on all cluster masters that have been added to the search heads as searchable clusters.
Step 2: Configure the search heads
The Rules Engine needs the plaintext password of the username to access the cluster master REST endpoint.
On one of the search heads, add the username, password, and realm of the service account to the search head password storage. You can add the information through the storage/passwords
REST endpoint. For example:
curl -k -u admin:Chang3d! https://localhost:8089/servicesNS/nobody/SA-ITOA/storage/passwords -d name=<username> -d password=<password> -d realm=<cluster_master_hostname>
The realm is the host name of the cluster master. Make sure the realm matches the host part of the master_uri
field returned from the services/cluster/config search head endpoint. For example, https://localhost:8089/services/cluster/config returns "master_uri": "https://master1:8089"
, so the realm is master1
.
Limit the number of Rules Engine retries
By default, the Rules Engine indefinitely queries the cluster masters until they all return a healthy state. You can limit the number of retries if you want the Rules Engine to only attempt a specified number of status checks. After the specified number of attempts, the Rules Engine posts a message in Splunk Web and continues with the startup process of restoring active groups, backfilling events, and processing events.
Prerequisites
- Only users with file system access, such as system administrators, can limit the number of cluster master checks using configuration files.
- Review the steps in How to edit a configuration file in the Splunk Enterprise Admin Manual.
- You can have configuration files with the same name in your default, local, and app directories. Read Where you can place (or find) your modified configuration files in the Splunk Enterprise Admin Manual.
Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make changes to the files in the local directory.
Steps
- Open or create a local itsi_rules_engine.properties file at
$SPLUNK_HOME/etc/apps/SA-ITOA/local
. - Add the following setting and specify the number of retries:
max_cluster_rolling_restart_retry_count = <integer>
- Restart your Splunk software.
Troubleshoot the Rules Engine and event grouping in ITSI | Event Analytics Audit dashboard |
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.11.0, 4.11.1, 4.11.2, 4.11.3, 4.11.4, 4.11.5, 4.11.6, 4.12.0 Cloud only, 4.12.1 Cloud only, 4.12.2 Cloud only, 4.13.0, 4.13.1, 4.13.2, 4.13.3, 4.14.0 Cloud only, 4.14.1 Cloud only, 4.14.2 Cloud only, 4.15.0, 4.15.1, 4.15.2, 4.15.3, 4.16.0 Cloud only, 4.17.0, 4.17.1, 4.18.0, 4.18.1, 4.19.0, 4.19.1
Feedback submitted, thanks!