Best practices for implementing Event Analytics in ITSI
Consider the following best practices when configuring Event Analytics in IT Service Intelligence (ITSI). For more information about ITSI's Event Analytics functionality, see About ITSI Event Analytics.
Best practices for implementing Event Analytics for ITSI services and KPIs
For best practices around leveraging ITSI's Event Analytics functionality to translate service and KPI health into notable events and episodes, see About the Content Pack for Monitoring and Alerting. The content pack provides a set of preconfigured correlation searches and notable event aggregation policies which, when enabled, produce meaningful and actionable alerts.
Best practices for implementing Event Analytics for other data sources
The following best practices help you successfully ingest and aggregate third-party alerts in ITSI. For more information, see Ingest third-party alerts as ITSI notable events.
To avoid duplicate events, use the same frequency and time range in correlation searches
When configuring a correlation search, consider using the same value for the search frequency and time range to avoid duplicate events. For example, a search might run every five minutes and also look back every five minutes.
If there's latency in your data and you need to look for events you might have missed, consider expanding the time range. For example, the search could run every minute but look back 5 minutes.
To reduce load on your system, don't use a time range greater than 5 minutes
Exceeding a calculation window of 5 minutes can put a lot of load on your system, especially if you have a lot of events coming in. If you want to avoid putting extra load on your system, consider reducing the time range to 5 minutes or less. One exception is if your data is coming in more sporadically. For example, if your data comes in every 15 minutes, consider using a 15-minute time range.
Normalize all the important fields in your third-party events
When you're creating correlation searches, don't only normalize on obvious fields that exist in a lot of data sources, like host, severity, event type, message, and so on. It's also important to normalize fields that you know are important in your events. For example, when you're looking at Windows event logs, what do you look at to know if something is good or bad? Normalize those fields as well and use them to build out a common information model.
Perform this normalization process for every data source you have so you can easily identify important fields when creating aggregation policies.
Create one correlation search per data source
For every third-party data source you're bringing into ITSI, create a single correlation search to normalize those fields and generate notable events. For example, one for SCOM, one for SolarWinds, and so on.
Don't create too many aggregation policies
Limit the number of aggregation policies you enable in your environment. Too many aggregation policies create too many groups, which produces an overly granular view of your IT environment. By limiting the number of policies, you create more end-to-end visibility and avoid creating silos of collaboration between groups in your organization. Make sure to group events according to how those events are related, not based on how people work to resolve those issues.
Only select 5-10 fields for Smart Mode analysis
By default, when selecting fields to analyze for event similarity, Smart Mode selects any fields that have good event coverage. As a best practice, begin by unchecking all the boxes. Then select 5-10 fields that you've normalized in a correlation search.
Selecting between 5-10 fields ensures that you generate an appropriate size and quantity of episodes. If you select fewer than five fields, you only give the aggregation policy with a few things to look at when comparing similarity. For example, if the message of two events is somewhat similar and the location is similar, they might be grouped together. This can lead to a small number of very large groups. The opposite is also true. If you select too many fields, the chances of them all being similar is very low. This can lead to a large number of groups containing only one event.
Best practices for the Rules Engine
The following best practices help you successfully configure the Rules Engine. For more information about the Rules Engine, see Overview of the ITSI Rules Engine.
Disable the Rules Engine during rolling indexer restarts
During a rolling restart of an indexer cluster, search results are incomplete and real-time searches are restarted every time a new indexer completes its restart process. The ITSI Rules Engine must run searches to rebuild its in-memory state every time it restarts. When those searches return incomplete or inconsistent results, it leads to duplicate processing of notable events and unexpected breaks in episodes. Therefore, it's a best practice to disable the itsi_event_grouping Rules Engine saved search in ITSI for the duration of the restart process.
The solution to shut down the Rules Engine trades off correctness and latency. If you shut down the Rules Engine, it doesn't process any events during the restart, so latency increases. However, if you allow the Rules Engine to run and the duration of the restart is long, events and episodes are duplicated and events are processed out of order. While shutdown is recommended, you can also choose to keep the Rules Engine running and accept potential duplicates in favor of lower latency.
To disable the Rules Engine, perform the following steps:
- Within Splunk Web, go to Settings > Searches, reports, and alerts.
- In the App dropdown, select All.
- Use the filter to locate the itsi_event_grouping search.
- Click Actions > Disable.
- After the rolling restart finishes, perform the same steps to re-enable it.
For more information about rolling restarts, see Perform a rolling restart of an indexer cluster in the Managing Indexers and Clusters of Indexers manual.
Ingest SNMP traps in ITSI
Notable event aggregation policies overview for ITSI
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.2.0, 4.2.1, 4.2.2, 4.2.3, 4.3.0, 4.3.1, 4.4.0, 4.4.1, 4.4.2, 4.4.3, 4.4.4, 4.4.5