Configure Rules Engine periodic backfill in ITSI
The IT Service Intelligence (ITSI) Rules Engine backfills missing events if the data on indexers is temporarily unavailable due to network issues. The backfill process runs every twelve minutes to check for missed events. This process helps the Rules Engine stabilize episode generation and action execution under unstable conditions.
Periodic backfill only functions when events are missed while the Rules Engine real-time rules search is running. It doesn't backfill events generated when the search isn't running, or when indexer nodes are shut down or restarted intentionally, like upgrading indexers or upgrading Splunk Enterprise.
The Rules Engine uses the following searches to backfill missing events:
grouping_missed_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \
| stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts")) AS c_grouped by event_id \
| where c_grouped=0 | fields _time, _raw, source, sourcetype
backfill_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \
| stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts")) AS c_grouped by event_id \
| where c_grouped=0 | fields _time, _raw, source, sourcetype | sort 0 _time
Update index in backfill searches for custom indexes
If you are using a custom index, you have to add these searches to a local version of the rules engine properties found in $SPLUNK_HOME/etc/apps/SA-ITOA/local/itsi_rules_engine.properties. Note, the index name in the searches is changed to itsi_grouped_alerts_prod
.
grouping_missed_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \
| stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts_prod")) AS c_grouped by event_id \
| where c_grouped=0 | fields _time, _raw, source, sourcetype
backfill_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \
| stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts_prod")) AS c_grouped by event_id \
| where c_grouped=0 | fields _time, _raw, source, sourcetype | sort 0 _time
For more information about the ITSI Rules Engine and the Rules Engine search, see Overview of the ITSI Rules Engine.
Tune periodic backfill frequency and time windows
Periodic backfill is controlled by the following parameters in $SPLUNK_HOME/etc/apps/SA-ITOA/default/itsi_rules_engine.properties
:
periodic_backfill_frequency = 720 periodic_backfill_time_window = 3600 periodic_backfill_to_realtime_gap = 720 periodic_backfill_search_job_check_limit = 15
The default values are optimized for most cases, but you can tune them based on your environment. For more information about editing itsi_rules_engine.properties
, see Rules Engine properties reference in ITSI.
Configure backfill frequency
The periodic_backfill_frequency
setting controls the frequency, in seconds, that the Rules Engine reprocesses events that were not grouped. By default, ungrouped events are reprocessed every 12 minutes (720 seconds). Consider increasing this setting if you increase the periodic backfill time window.
This setting must be at least 120 seconds higher than the default event_cache_expiry_time
of 180 seconds. Therefore, the minimum periodic backfill frequency is 300 seconds. Otherwise duplicate events might be grouped.
Configure the backfill time window
The periodic_backfill_time_window
setting defines a sliding time window, in seconds, used by the Rules Engine to pick up events that were not grouped. By default, the Rules Engine checks for missing events during a 1-hour window (3600 seconds). Therefore, for every time window, it has five chances (3600 / 720) to pick up missed events. Once an event is backfilled or processed by the normal Rules Engine pipeline, it isn't reprocessed again.
You can save resources for other Rules Engine processing tasks by tuning down the backfill window size. However, the value of periodic_backfill_time_window
must be higher than the periodic_backfill_frequency
.
Configure the backfill to real-time gap
When periodic backfill begins, the Rules Engine doesn't backfill the events with the most recent timestamps because the generated events might need some time to be indexed. The periodic_backfill_to_realtime_gap
setting, which is 12 minutes by default, determines the time gap before beginning to backfill events. The wait time guarantees that each event has a chance to be indexed before it's considered for backfill.
Example
In the following example, the periodic backfill search is scheduled to run at 9:12 AM. The periodic_backfill_frequency
is 12 minutes, so the next backfill will run 12 minutes later at 9:24 AM, and so on every 12 minutes after that.
The periodic_backfill_to_realtime_gap
is also 12 minutes, so the ending backfill boundary is at 9:00 AM. The periodic_backfill_time_window
to 60 minutes, so the starting backfill boundary is 8:00 AM.
Restore active episodes when the Rules Engine restarts in ITSI | Best practices for implementing Event Analytics in ITSI |
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.11.0, 4.11.1, 4.11.2, 4.11.3, 4.11.4, 4.11.5, 4.11.6, 4.12.0 Cloud only, 4.12.1 Cloud only, 4.12.2 Cloud only, 4.13.0, 4.13.1, 4.13.2, 4.13.3, 4.14.0 Cloud only, 4.14.1 Cloud only, 4.14.2 Cloud only, 4.15.0, 4.15.1, 4.15.2, 4.15.3, 4.16.0 Cloud only, 4.17.0, 4.17.1, 4.18.0, 4.18.1, 4.19.0, 4.19.1
Feedback submitted, thanks!