Splunk® IT Service Intelligence

Event Analytics Manual

Configure Rules Engine periodic backfill in ITSI

The IT Service Intelligence (ITSI) Rules Engine backfills missing events if the data on indexers is temporarily unavailable due to network issues. The backfill process runs every twelve minutes to check for missed events. This process helps the Rules Engine stabilize episode generation and action execution under unstable conditions.

Periodic backfill only functions when events are missed while the Rules Engine real-time rules search is running. It doesn't backfill events generated when the search isn't running, or when indexer nodes are shut down or restarted intentionally, like upgrading indexers or upgrading Splunk Enterprise.

The Rules Engine uses the following searches to backfill missing events:

grouping_missed_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \ | stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts")) AS c_grouped by event_id \ | where c_grouped=0 | fields _time, _raw, source, sourcetype

backfill_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \ | stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts")) AS c_grouped by event_id \ | where c_grouped=0 | fields _time, _raw, source, sourcetype | sort 0 _time

Update index in backfill searches for custom indexes

If you are using a custom index, you have to add these searches to a local version of the rules engine properties found in $SPLUNK_HOME/etc/apps/SA-ITOA/local/itsi_rules_engine.properties. Note, the index name in the searches is changed to itsi_grouped_alerts_prod.

grouping_missed_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \ | stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts_prod")) AS c_grouped by event_id \ | where c_grouped=0 | fields _time, _raw, source, sourcetype

backfill_events_search = search (`itsi_event_management_index_with_close_events` ) OR ( `itsi_event_management_group_index`) NOT orig_sourcetype=snow:incident \ | stats first(_time) AS _time first(_raw) AS _raw first(source) AS source first(sourcetype) AS sourcetype count(eval(index="itsi_grouped_alerts_prod")) AS c_grouped by event_id \ | where c_grouped=0 | fields _time, _raw, source, sourcetype | sort 0 _time

For more information about the ITSI Rules Engine and the Rules Engine search, see Overview of the ITSI Rules Engine.

Tune periodic backfill frequency and time windows

Periodic backfill is controlled by the following parameters in $SPLUNK_HOME/etc/apps/SA-ITOA/default/itsi_rules_engine.properties:

periodic_backfill_frequency = 720

periodic_backfill_time_window = 3600

periodic_backfill_to_realtime_gap = 720

periodic_backfill_search_job_check_limit = 15

The default values are optimized for most cases, but you can tune them based on your environment. For more information about editing itsi_rules_engine.properties, see Rules Engine properties reference in ITSI.

Configure backfill frequency

The periodic_backfill_frequency setting controls the frequency, in seconds, that the Rules Engine reprocesses events that were not grouped. By default, ungrouped events are reprocessed every 12 minutes (720 seconds). Consider increasing this setting if you increase the periodic backfill time window.

This setting must be at least 120 seconds higher than the default event_cache_expiry_time of 180 seconds. Therefore, the minimum periodic backfill frequency is 300 seconds. Otherwise duplicate events might be grouped.

Configure the backfill time window

The periodic_backfill_time_window setting defines a sliding time window, in seconds, used by the Rules Engine to pick up events that were not grouped. By default, the Rules Engine checks for missing events during a 1-hour window (3600 seconds). Therefore, for every time window, it has five chances (3600 / 720) to pick up missed events. Once an event is backfilled or processed by the normal Rules Engine pipeline, it isn't reprocessed again.

You can save resources for other Rules Engine processing tasks by tuning down the backfill window size. However, the value of periodic_backfill_time_window must be higher than the periodic_backfill_frequency.

Configure the backfill to real-time gap

When periodic backfill begins, the Rules Engine doesn't backfill the events with the most recent timestamps because the generated events might need some time to be indexed. The periodic_backfill_to_realtime_gap setting, which is 12 minutes by default, determines the time gap before beginning to backfill events. The wait time guarantees that each event has a chance to be indexed before it's considered for backfill.

Find events that are periodically backfilled

Track the number of events being processed and backfilled over time to check that the Rules Engine is running. Use this search to find if events are being periodically backfilled:

index=_internal source=*rules_engine* (Status=EventReceived OR Status=EventBackfilledPeriodically OR Status=EventBackfilled) | timechart span=5m count(eval(searchmatch("Status=EventReceived"))) AS "Event Received" count(eval(searchmatch("Status=EventBackfilledPeriodically"))) AS "Event Periodically Backfilled" count(eval(searchmatch("Status=EventBackfilled"))) AS "Event Backfilled"

Find specific event IDs for periodically backfilled events

Identify the event_id to track the specific events getting backfilled and troubleshoot issues. Use this search to find specific event IDs for periodically backfilled events:

index="_internal" source=*rules_engine* Status=EventBackfilledPeriodically

Search Rules Engine to find backfilled events

Use this search to search Rules Engine logs and identify how long events have existed:

index="_internal" source=*rules_engine* <event_id>

Check if the event has been grouped

Confirm that events have been processed before they were backfilled to troubleshoot the cause of Rules Engine backfill issues. Use this search to confirm if events were processed and grouped:

FunctionName=handleEventProcessingRequestMessage, Status=EventReceived, EventId=<event_id>

Confirm when an event was grouped in the itsi_grouped_alerts index

Verify the time that an event was grouped to identify any unexpected delays in the backfill time. Use this search to find the time an event was grouped and compare it with the time an event was backfilled:

index="itsi_grouped_alerts" <event_id>

When an event is backfilled after being processed, there may be issues with one or more searches. Use this search to find events that were backfilled after being processed:

Status=EventRouted, IsBackfilledEvent=true, EventId=<event_id>

Check the Rules Engine logs for errors

Use this search to check for any errors in the Rules Engine. When there are no errors and an event was processed and grouped, the periodic backfill is returning incorrect results. Use this search to find the job SIDs to further troubleshoot this issue:

index="_internal" source=*rules_engine* handleEventBackfillRequestMessage Status=Processing

Example

In the following example, the periodic backfill search is scheduled to run at 9:12 AM. The periodic_backfill_frequency is 12 minutes, so the next backfill will run 12 minutes later at 9:24 AM, and so on every 12 minutes after that.

The periodic_backfill_to_realtime_gap is also 12 minutes, so the ending backfill boundary is at 9:00 AM. The periodic_backfill_time_window to 60 minutes, so the starting backfill boundary is 8:00 AM.


PeriodicBackfill.png
Last modified on 09 April, 2025
Restore active episodes when the Rules Engine restarts in ITSI   Best practices for implementing Event Analytics in ITSI

This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.11.0, 4.11.1, 4.11.2, 4.11.3, 4.11.4, 4.11.5, 4.11.6, 4.12.0 Cloud only, 4.12.1 Cloud only, 4.12.2 Cloud only, 4.13.0, 4.13.1, 4.13.2, 4.13.3, 4.14.0 Cloud only, 4.14.1 Cloud only, 4.14.2 Cloud only, 4.15.0, 4.15.1, 4.15.2, 4.15.3, 4.16.0 Cloud only, 4.17.0, 4.17.1, 4.18.0, 4.18.1, 4.19.0, 4.19.1, 4.19.2, 4.19.3, 4.19.4, 4.20.0


Please expect delayed responses to documentation feedback while the team migrates content to a new system. We value your input and thank you for your patience as we work to provide you with an improved content experience!

Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters