Monitor KPI data drift in ITSI
Use drift detection to identify changes in KPI behavior that occur slowly over longer periods of time, and prevent issues before they arise. Normal KPIs can display an incorrect severity (high or critical) due to user configuration changes in code, data, workload, or infrastructure. For example, a KPI measuring disk usage slowly increases over a period of weeks, and adaptive thresholds continue to adjust daily to the new values until an issue arises, which may lead to data loss. Drift detection helps notify you of these incremental changes to help you proactively remediate the issue.
Drift detection provides additional context about your KPI behavior, helping you to troubleshoot the root cause, and prevent inaccurate alerts or missed opportunities for proactive engagement.
Prerequisites
- You must have the itoa_admin or itoa_team_admin role in order to configure drift detection, with the read_itsi_services and write_itsi_services capabilities.
- Install Python for Scientific Computing version 4.2.0 or later in order to use this feature.
- Your entities need at least 3 months of backfilled data, or display a historical pattern or trend in order to produce meaningful results. Drift detection analyzes KPI changes occurring over longer time periods, in contrast with adaptive thresholding that covers more rapid changes. For more information, see When to use adaptive thresholds.
Configure drift detection
- From the ITSI navigation menu, select Configuration then Service and KPI Management. Alternatively, select a service from the Service Analyzer and go to the KPIs tab.
- From the KPIs tab, find a KPI or select multiple KPIs to apply drift detection settings.
- For multiple KPIs, select Configure drift detection from the Bulk options dropdown. For an individual KPI, select the three dot menu next to the KPI's name in the list.
- From the Drift detection configuration page, configure the following settings:
Option Description Data resolution The time frame over which data is collected and summarized. Function The statistical method used to analyze aggregated data: - Max
- Average
- Min
- Sum
Look back period Time frame over which data is analyzed to evaluate trends and patterns. Drift tolerance % Percentage amount that a KPI can deviate from the baseline and be considered normal. - Select Preview on chart to see when drift is detected based on your selected settings. There are two types of drift that can be detected:
- Gradual drift: indicates gradual changes in KPI behavior patterns, marked by a beginning and end point. Gradual drift suggests potential issues caused by minute increases or decreases in the KPI value over several weeks.
- Rapid drift: indicates when KPI behavior rapidly changes in a short period of time. Rapid drift suggests a sustained change in the KPI occurring over a short time period, and that the configuration needs review.
- Select Save configuration.
View drifting KPIs in a service
Follow these steps to view drifting KPIs from the Service Analyzer:
- Select a service from the Service Analyzer page.
- On the KPIs & Episodes tab, a special symbol next to the KPI name indicates that drift has been detected. For example, drift was detected in the 4xx Errors Count KPI:
- Hover on the KPI name to view the percentage amount that the KPI has drifted from the original baseline value configured in your drift settings.
- Select the Drift Review tab to see a list of episodes associated with a specific drift alert.
View drifting KPIs in the Configuration assistant
Follow these steps to view all KPIs with drift detected from the Configuration assistant:
- From the navigation menu, select Configuration, then ITSI Configuration Assistant.
- From the Configuration assistant page, view the KPIs with drift detected from the Configuration issues detected section.
- In the sidebar panel, select Get recommendations to generate thresholding recommendations for KPIs with drift detected.
- Select Reconfigure drift to set up new drift configuration settings if the drift detected was not accurate.
KPI drift notable event aggregation policy
The KPI drift policy is a notable event aggregation policy that automatically groups alerts for drifting KPIs into episodes. You can find drift episodes by filtering on this policy, or viewing the details of individual episodes.
To view drift episodes, follow these steps:
- Go to the Alerts and Episodes page.
- Select an episode. On the Impact tab, you're alerted if drift was detected in the episode details section. You can view the impacted services and KPIs from the episode details section.
- (Optional) Provide feedback about whether the detected drift is accurate based on your KPI's expected behavior in order to continue refining the drift detection algorithm.
Configure KPI thresholds with machine learning in ITSI | Set KPI importance values in ITSI |
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.20.0
Feedback submitted, thanks!