Add a KPI to a service in ITSI
IT Service Intelligence uses KPI searches to monitor the performance of your IT services. You must add at least one KPI search to a service to use ITSI. For information on how the number of KPIs can impact performance, see Performance considerations in the ITSI Installation and Upgrade manual.
This topic walks you through the basic KPI search creation modal. These instructions assume that you have already created a service. If not, see Overview of creating services in ITSI in this manual. You must create a service with at least one KPI to run ITSI.
To design a KPI search, you need to know the following information:
- The source search expression, including selection criteria.
- The specific field in the data that you want to monitor.
- The time span and frequency for the KPI to update.
- How to summarize the data over the time span (count, last, sum, average, and so on).
- Whether you want to split the KPI result values by entities (for example, by host).
Step 1: Add a new KPI
- Click Configure > Services from the ITSI top menu bar.
- Select an existing service.
- Click New in the KPI tab and choose one of the following options:
- Select Generic KPI.
- Select a KPI template. For example, Application Server: CPU and Memory > Memory Used. KPI templates provide pre-configured KPI source searches, including ad hoc searches and base searches, based on ITSI modules. KPI templates are tailored for specific service monitoring use cases, such as operating systems, databases, web servers, load balancers. virtual machines, and so on.
- Select Generic KPI.
- In Step 1 of the KPI creation modal, enter the KPI title and optional description. Click Next.
Step 2: Define a source search
When you create a KPI, you must define a source search on which to build the KPI. You can chose from four source search types: data model search, metrics search, ad hoc search, and base search.
Note: Before you define your source search, consider the performance implications for your particular deployment. While data models are suitable for smaller test environments, base searches generally provide best performance in larger production settings. See Create KPI base searches in ITSI.
Define a source search from a data model
- Configure your data model search.
Field Description KPI source Data Model Data Model The data model object, child, and attribute fields. For example,
Host Operating System>
When you create a KPI search from a data model, the data model object field becomes the threshold field. When you create a KPI search from an ad hoc search, you must manually enter the threshold field.
Filters (optional) Click Add Filter to add data model filter conditions. Data model filters let you include/exclude search result data based on the filter conditions. For example, the filter condition
host Equals ipaddressfilters out all values for the data model search field
host, except for values that equal
ipaddress. Data model filtering can help improve the speed and accuracy of your searches by excluding extraneous data from search results.
- Click Generated Search to preview your KPI search string.
Use the Generated Search box to view changes that ITSI makes to your search string as you build your KPI. Click anywhere on the Generated Search itself to run the search.
- Click Next.
Define a source search from a metrics search
- Configure your metrics search.
Field Description KPI source Metrics Search
If there are no metrics indexes configured in your Splunk deployment, you will receive the message: "No metrics found." For more information about metrics, see Get started with Metrics in the Splunk Enterprise Metrics Manual.
Metrics Search Select the metrics index from which to choose a metric. Metric Select the metric to use for the KPI. For example,
- Click Generated Search to preview your KPI search string. Metrics searches begin with the
- Click Next.
Define a source search from an ad hoc search
- Configure your ad hoc search.
Field Description KPI source Ad hoc Search Search The ad hoc search string that you create. This is the event gathering search for the KPI.
Note: The use of transforming commands, the
`gettime`macro, or time modifiers in your KPI search is not recommended as this may cause issues with KPI backfill, the display of raw data on ITSI views such as glass tables and deep dives that allow you to run KPI searches against raw data, and the KPI threshold preview.
Threshold Field The field in your data that the KPI aggregates and monitors. For pure counts use
- Click Generated Search to preview your KPI search string.
- Click Next.
Define a source search from a base search
- Configure your base search.
Field Description KPI source Base Search Base Search The base search that you want to associate with the KPI. For example, DA-ITSI-OS: Performance.Memory. Base searches provide pre-configured KPI templates built on ITSI modules. Metric The metric that you want to associate with the KPI. For example,
- (Optional) click Generated Search to preview your KPI search string.
- Click Next.
Note: Most fields in the next window (steps 3 through 6) are pre-populated for the base search by the KPI template. For more information on how to create and configure KPI base searches, see Create KPI base searches.
Step 3: Filter entities
Filter entities to have more granular control of your KPI at the entity level.
Split by Entity
The Split by Entity option lets you maintain a breakdown of KPI values at the entity level. Use Split by Entity to enable monitoring of KPI values for each individual entity against which a KPI is running.
You must split KPIs by entity to use the following ITSI features:
- Per-entity thresholds. See Set per entity threshold values in this manual.
- Entity overlays. See Add overlays to a deep dive in ITSI in the ITSI User Manual.
- Maximum severity view in the Service Analyzer. See Aggregate versus maximum severity KPI values in ITSI in the ITSI User Manual.
- Cohesive anomaly detection. See Apply anomaly detection to a KPI in ITSI in this manual.
Configure the following fields:
|Split by Entity||Enable/disable a breakdown of KPI values at the entity level. The KPI must be running against two or more entities.|
|Entity Split Field||Specify the field in your data to use to look up the corresponding split by entities. The default lookup field for data model searches and ad hoc searches is |
When filtering a KPI down to entities, you can split by a field other than the field you are using for filtering the entities (specified in the Entity Filter Field). This allows you to filter to the hosts that affect your service, but split out your data by a different field. For example, you might want to filter down to all of your database hosts but split the metric by the processes running on the hosts.
Entity filtering lets you specify the entities against which a KPI search will run. Provide an entity filter field to reduce collection of extraneous data.
For example, if you enable entity filtering for a KPI in the Online Sales service, only entities assigned to that service are used to calculate the KPI search metrics.
|Filter to Entities in Service||Enable/disable entity filtering.|
|Entity Filter Field||Specify the field in your data to use to look up the corresponding entities by which to filter the KPI. For metrics searches, select a dimension for the metric. The default field for data model searches, ad hoc searches, and metrics searches is |
With the removal of entity alias filtering in version 4.2.x, only the Entity Filter Field determines the entity aliases to use for filtering. ITSI now strictly matches entities against KPI search results using both the alias key and value, whereas before it only used the alias value. The strict entity alias matching also occurs when generating notable events through correlation searches. The entity lookup field must be an actual entity alias field for the match to occur. For more information, see Removed features in Splunk IT Service Intelligence in the ITSI Release Notes
Step 4: Add monitoring calculations
Configure the following KPI monitoring calculations:
|KPI Search Schedule||Determines the frequency of the KPI search. |
Avoid scheduling searches at one minute intervals. Running multiple concurrent KPI searches at short intervals can produce lengthy search queues and is not necessary to monitor most KPIs.
|Service/Aggregate Calculation||The statistical operation that ITSI performs on KPI search results. The correct aggregate calculation to use depends on the type of KPI search. For example, if your search returns results for CPU Load percentage, you could use |
|Calculation Window||The time period over which the calculation applies. For example, |
|Fill Data Gaps with||How to treat gaps in your data. This affects how KPI data gaps are displayed in service analyzers, deep dive KPI lanes, glass table KPI widgets, and other dashboards in ITSI populated by the summary index.|
Filled gap values are not used in the calculations performed for Anomaly Detection and Adaptive Thresholding.
How filling data gaps with last reported value works
Each time the saved search runs for a KPI with the Fill Data Gaps with option set to Last available value, the alert value for the KPI is cached in a KV store collection called itsi_kpi_summary_cache. ITSI uses a lookup named itsi_kpi_alert_value_cache in the KPI saved search to fill entity-level and service-aggregate gaps for the KPI using the cached alert value.
To prevent bloating of the collection with entity/service-aggregate KPI results, a retention policy runs on the itsi_kpi_summary_cache collection using a Splunk modular input. The modular input runs every 15 minutes and removes the entries from cache that have not been updated for more than 30 minutes. 15 minutes is the default frequency and 30 minutes is the default retention time for entries in cache. You can change the frequency and retention time in the
[itsi_age_kpi_alert_value_cache://age_kpi_alert_value_cache] stanza of the
Filling data gaps with the last reported value occurs for at most 45 minutes, in accordance with the modular input interval and retention time (15 minutes + 30 minutes by default). If data gaps for a KPI continue to occur for more than 30 to 45 minutes, the KPI will stop getting filled with the last reported value and data gaps will start displaying as N/A values.
Step 5: Unit and Monitoring Lag
Configure the following optional settings:
|Unit||The unit of measurement that you want to appear in KPI visualizations. For example, GB, Mbps, secs, and so on.|
|Monitoring lag||The monitoring lag time (in seconds) to offset the indexing lag. Monitoring lag is an estimate of the number of seconds it takes for new events to move from the source to the index. When indexing large quantities of data, an indexing lag can occur, which can cause performance issues. Delay the search time window to ensure that events are actually in the index before running the search. In most cases, don't set this value below 30.|
Step 6: Enable backfill
Enable backfill to fill the summary index (itsi_summary) with historical raw KPI data. Backfill runs a search in the background that populates the summary index with historical KPI data for a given time range (backfill period) as it would have been populated at a regularly scheduled time by KPI saved searches. In other words, even though the summary index only started collecting data at the start of this week when the KPI was created, if necessary you can use the backfill option to fill the summary index with data from the past month.
Backfill is a one-time operation. Once started, it cannot be redone or undone. For example, if you backfill 60 days of data and then later decide that you want 120 days, you cannot go back and change the backfill period. Think carefully about how many days of data you want to backfill before saving the service.
The backfill option requires you to have indexed adequate raw data for the backfill period you select.
When you enable backfill, you must indicate how many days of data to backfill. You can choose a predefined time range like
last 7 days, or select a custom date prior to the current date. If you choose a specific date, the dropdown dynamically updates with the number of days you're backfilling to.
The backfill period is the time range of data that is available after backfill is complete. For example, if you select
last 30 days, ITSI fills the summary index with data from the past 30 days. In other words, you now have 30 days of KPI data available.
If you backfill a KPI that uses Last available value to fill data gaps, data gaps are backfilled with filled-in alert values (using the last reported value for the KPI) instead of N/A alert values. If you backfill a KPI that uses a Custom value to fill data gaps, data gaps are backfilled with filled-in alert values (using the custom value provided) instead of N/A alert values. See Step 4: Add monitoring calculations.
You must save the service to initiate the backfill. A message appears in Splunk Web that informs you when the backfill is complete.
ITSI supports a maximum of 60 days of data in the summary index. Therefore, after you configure backfill, you will see one of the following messages:
- "'Backfill is not available"' - More than 60 days of summary index data already exists.
- "Backfill has been configured for last <#> days of data" - The backfill job is configured but hasn't run yet. This might be because the service has not been saved yet.
- "Backfill completed for last <#> days" - Backfill has completed successfully. This message only shows up until a total of 60 days of data is in the summary index, then it changes to "Backfill is not available".
Step 7: Set thresholds
Severity-level thresholds determine the current status of your KPI. When KPI values meet threshold conditions, the KPI status changes, for example, from high (yellow) to critical (red). The current status of the KPI is reflected in all views across the product, including service analyzers, glass tables, and deep dives.
You can manually add threshold values for your KPIs one at a time using the threshold preview window. Or apply threshold time policies, which automatically adapt threshold values based on day and time. See Create KPI threshold time policies in ITSI.
ITSI supports two types of KPI severity-level thresholds: Aggregate thresholds and per-entity thresholds. Adaptive thresholds can be used with aggregate thresholds but not per-entity thresholds.
Set aggregate threshold values
Aggregate thresholds are useful for monitoring the status of aggregated KPI values. For example, you might apply aggregate thresholds to monitor the status of KPIs that return the total number of service requests or service errors, based on a calculation that uses the stats count function.
- Click Aggregate Thresholds.
- Click Add threshold to add a range of severity-level thresholds to the threshold preview graph.
- Click Finish.
Set per-entity threshold values
Per-entity thresholds are useful for monitoring multiple separate entities against which a single KPI is running. For example, you might have a KPI, such as Free Memory %, that is running against three separate servers. Using per-entity thresholds, you can monitor the status of Free Memory % on each individual server.
Adaptive thresholding cannot be used on a per-entity basis.
To use per-entity thresholds, a KPI must be split by entity. See "Step 3: Filter entities" above.
- Click Per Entity Thresholds.
- Click Add threshold to add a range of severity-level thresholds to the threshold preview graph. Optionally, if you want to use the same values as the aggregate thresholds, click Apply values to copy those threshold values over.
The threshold preview shows a separate search results graph for each entity that the KPI is running against.
- Click Finish.
Overview of creating KPIs in ITSI
Set KPI importance values in ITSI
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.2.0