Splunk® IT Service Intelligence

Event Analytics Manual

This documentation does not apply to the most recent version of Splunk® IT Service Intelligence. For documentation on the most recent version, go to the latest release.

Event Analytics Monitoring dashboard

The Event Analytics Monitoring dashboard provides troubleshooting information for ITSI's Event Analytics functionality.

Rules Engine Statistics

Panel Description
Rules Engine Information The Java version being used by the Rules Engine. ITSI requires Java 8 - 11 to run notable event management features.
Rules Engine Real-Time Search Configuration Configuration information related to the following settings in savedsearches.conf:
  • dispatch.indexedRealtime - Defaults to "1". Do not modify this setting.
  • dispatch.indexedRealtimeOffset - Defaults to "60". Increase this setting if events are not being grouped.
  • dispatch.rt_backfill - Defaults to "0" (false). Do not modify this setting.
  • cron_schedule - Defaults to * * * * *. Do not modify this setting.
Rules Engine Event Processing Volume The number of events processed by each Rules Engine activity every 10 minutes.
Rules Engine Event Processing Times The median time taken by each Rules Engine activity to process an event.
Rules Engine Schedulers Health Check The number of times various Rules Engine schedulers run per 12-minute time frame. These statistics are used internally to ensure the schedulers are running as expected. By default, the Event Periodic Backfill Scheduler runs every 12 minutes, the Policy Rules Check Scheduler runs every 1 minute for each aggregation policy, and the Policy Group Updates KV Store Sync Scheduler runs every 28 seconds for each aggregation policy.
Rules Engine Starts and Stops The number of times the Rules Engine starts and stops each hour. A Rules Engine restart can kick off multiple backfill processes with the default phased_execution_mode value in limits.conf, which might lead to the creation of duplicate episodes. Restarts can also help troubleshoot if you aren't seeing certain expected episodes.
Rules Engine Activity The number of states for each completed Rules Engine instance. You can use the instance ID to search the ITSI logs and troubleshoot issues.
Rules Engine Activity Details The details of activities of each Rules Engine instance completed. You can use the instance ID to search the ITSI logs and troubleshoot issues.

Skipped Events

Panel Description
Skipped Events Count A raw count of skipped events (events that are not included in any episodes) over the past 7 days. Under normal conditions, this number should be zero.
Skipped Events Percentage The percentage of ungrouped events versus grouped events over the past 7 days. Under normal conditions, this percentage should be zero.
Backfilled Events Count A raw count of backfilled events over the past 7 days. Under normal conditions, this number should be small.
Backfilled Events Percentage The percentage of backfilled events versus tracked events over the past 7 days. Under normal conditions, this percentage should be less than 1%.

Episode Processing Time

Panel Description
Episode Processing Times The amount of time it takes to convert tracked alerts (active raw notable events) to grouped alerts (active grouped notable events). Under normal conditions, the processing time should be about 60 seconds.
Event Processing Volume The number of events tracked in its_tracked_alerts, processed by the Rules Engine, and ingested into its_grouped_alerts per 10-minute time frame. Use this panel to troubleshoot grouping issues.
Event Processing Times The median time each Rules Engine component takes to process events. This time does not include the real-time search delay and is calculated from the point at which event is received by the Rules Engine.
Event Processing Time by Policy The median amount of time, in seconds, for each of your aggregation policies to process a single event.
Actions Processing Volume The total number of episode actions created, queued, and processed by the Rules Engine every 10 minutes.
Actions Processing Times The minimum, median, and maximum amount of time that the Rules Engine takes to process a single action.

Real-time Search Status

Panel Description
Event Analytics Real-Time Search Status The current state of real-time searches, including how much disk space they've used so far and how long they've been running. The searches exist in savesearches.conf.

HEC Tokens

Panel Description
Event Analytics HEC Tokens Shows which HEC tokens are available by host. If you create notable events using HEC tokens, this table shows which of your instances to send events to using the 'Auto Generated ITSI Event Management Token'. The absence of any of these tokens leads to Event Analytics not working properly.

KV Store Lookups

Panel Description
Event Analytics KV Store Lookups Compares the created KV store lookups with the ones that are required for event analytics but not created. If a lookup is not created, you must add it to transforms.conf.

Action Processing

Panel Description
Actions Processing Volume The number of actions created, queued, and processed every 10 minutes. If the number of queued and processed actions overlap, it means the action queue consumers are running smoothly. If the two lines don't overlap, check the Action Queue Times and Action Execution Times panels for details.
Action Processing Times The minimum, median, maximum and 99th percentile processing time for each action. This panel shows the total time between action generation and action execution.
Action Queue Times The minimum, median and maximum time that an action spends in the action queue. The three lines should fluctuate within a range of seconds. If one line trends upward, consider adding more action queue consumers.

Within ITSI, click Settings > Data Inputs and open IT Service Intelligence Actions Queue Consumer. Three consumers are enabled by default. Enable additional consumers to support more throughput and improve performance. If the trend continues upward, you can clone existing consumers and enable them.

Action Execution Times The minimum, median, and maximum time spent to execute actions. Any latencies shown here are not controlled by the action queue or Rules Engine. If the action execution times are high, check the latency of the KV store (for example, changing episode's state) or 3rd party service (for example, ServiceNow or BMC Remedy).
Action Queue Consumer Errors A count of action queue errors over time. To search for the action queue errors, run the following search:

index=_internal sourcetype="itsi_internal_log" source="*itsi_notable_event_actions_queue_consumer*" log_level=ERROR

Action Failures Each time an episode action fails, the failure details are listed here. The details are the same as those shown in the Activity tab of the individual episode.

Event Size Check

Panel Description
Notable Event Size Check Notable event sizes over time. The maximum allowable event size is 10000 bytes. If your events exceed this limit, increase the TRUNCATE setting in props.conf.

Correlation Search

Panel Description
Events By Correlation Searches and Indexes The number of tracked alerts and grouped alerts per correlation search. Use the dropdown menu to filter by individual correlation searches.

Aggregation Policy

Panel Description
Events by Aggregation Policy The number of grouped alerts per aggregation policy. You can filter by one or more aggregation policies to compare the number of events per policy.

Smart Mode

Panel Description
Smart Mode Policies Active and seed groups for each aggregation policy with Smart Mode enabled. Seed groups are templates used to group similar events together. When a new event comes in, it's matched to an existing seed group. If the event doesn't match any seed groups, a new active group is created.

An active group is promoted to a seed group based on the Smart Mode algorithm, at which point it's persisted in the KV store. If the Rules Engine restarts, all in-memory active groups are lost.

Rules Engine Smart Mode Event Processing Volume The number of events processed by each Rules Engine activity for Smart Mode policies every 10 minutes.
Last modified on 11 July, 2023
Event Analytics Audit dashboard   Use the Notable Event Actions SDK

This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.11.0, 4.11.1, 4.11.2, 4.11.3, 4.11.4, 4.11.5, 4.11.6, 4.12.0 Cloud only, 4.12.1 Cloud only, 4.12.2 Cloud only, 4.13.0, 4.13.1, 4.13.2, 4.13.3, 4.14.0 Cloud only, 4.14.1 Cloud only, 4.14.2 Cloud only, 4.15.0, 4.15.1, 4.15.2, 4.15.3, 4.16.0 Cloud only

Was this topic useful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters