Splunk® IT Service Intelligence

Administer Splunk IT Service Intelligence

Download manual as PDF

Download topic as PDF

Add a KPI to a service in ITSI

A Key Performance Indicator (KPI) is a recurring saved search that returns the value of an IT performance metric, such as CPU load percentage, memory used percentage, response time, and so on. Use KPI searches to monitor the performance of your IT services. You must add at least one KPI search to a service to use ITSI. For information on how the number of KPIs can impact performance, see Performance considerations in the ITSI Installation and Upgrade manual.

This topic walks you through the basic KPI search creation modal. These instructions assume that you have already created a service. If not, see Overview of creating services in ITSI in this manual. You must create a service with at least one KPI to run ITSI.

Prerequisites

  • You must create a service before you can add KPIs to it. For instructions, see Overview of creating services in ITSI.
  • To design a KPI search, you need to know the following information:
    • The source search expression, including selection criteria.
    • The specific field in the data that you want to monitor.
    • The time span and frequency for the KPI to update.
    • How to summarize the data over the time span (count, last, sum, average, and so on).
    • Whether you want to split the KPI result values by entities (for example, by host).

Step 1: Add a new KPI

  1. Click Configure > Services from the ITSI top menu bar.
  2. Select an existing service.
  3. Click New in the KPI tab and choose one of the following options:
    • Select Generic KPI.
    • Select a KPI template. For example, Application Server: CPU and Memory > Memory Used. KPI templates provide pre-configured KPI source searches, including ad hoc searches and base searches, based on ITSI modules. KPI templates are tailored for specific service monitoring use cases, such as operating systems, databases, web servers, load balancers. virtual machines, and so on.
  4. In Step 1 of the KPI creation modal, enter the KPI title and optional description. Click Next.

Step 2: Define a source search

When you create a KPI, you must define a source search on which to build the KPI. You can chose from four source search types: data model search, metrics search, ad hoc search, and base search.

Note: Before you define your source search, consider the performance implications for your particular deployment. While data models are suitable for smaller test environments, base searches generally provide best performance in larger production settings. See Create KPI base searches in ITSI.

Define a source search from a data model

  1. Configure your data model search.
    Field Description
    KPI Source Data Model
    Data Model The data model object, child, and attribute fields. For example, Host Operating System > Memory > mem_used_percent.

    When you create a KPI search from a data model, the data model object field becomes the threshold field. When you create a KPI search from an ad hoc search, you must manually enter the threshold field.

    Filters (optional) Click Add Filter to add data model filter conditions. Data model filters let you include/exclude search result data based on the filter conditions. For example, the filter condition host Equals ipaddress filters out all values for the data model search field host, except for values that equal ipaddress. Data model filtering can help improve the speed and accuracy of your searches by excluding extraneous data from search results.
  2. Click Generated Search to preview your KPI search string.
    Use the Generated Search box to view changes that ITSI makes to your search string as you build your KPI. Click anywhere on the Generated Search itself to run the search.
    GeneratedSearch.png
  3. Click Next.

Define a source search from a metrics search

  1. Configure your metrics search.
    Field Description
    KPI Source Metrics Search

    If there are no metrics indexes configured in your Splunk deployment, you'll see the message "No metrics found". For more information about metrics, see Get started with Metrics in the Splunk Enterprise Metrics Manual.

    Metrics Index Select the metrics index from which to choose a metric. The list only populates with indexes defined locally on the search head you are accessing. To use an index defined only on an indexer, enter it manually.
    Metric Name Select the metric to use for the KPI. For example, memory.used.
  2. Click Generated Search to preview your KPI search string. Metrics searches begin with the mstats command.
  3. Click Next.

Define a source search from an ad hoc search

  1. Configure your ad hoc search.
    Field Description
    KPI Source Ad hoc Search
    Search The ad hoc search string that you create. This is the event gathering search for the KPI.

    Note: The use of transforming commands, the mstats command, the `gettime` macro, or time modifiers in your KPI search is not recommended as this may cause issues with KPI backfill, the display of raw data on ITSI views such as glass tables and deep dives that allow you to run KPI searches against raw data, and the KPI threshold preview.
    Threshold Field The field in your data that the KPI aggregates and monitors. For pure counts use _time.
  2. Click Generated Search to preview your KPI search string.
  3. Click Next.

Define a source search from a base search

  1. Configure your base search.
    Field Description
    KPI Source Base Search
    Base Search The base search that you want to associate with the KPI. For example, DA-ITSI-OS:Performance.Memory. Base searches provide preconfigured KPI templates built on ITSI modules.
    Metric The metric that you want to associate with the KPI. For example, mem_free_percent.
  2. (Optional) click Generated Search to preview your KPI search string.
  3. Click Next.

Note: Most fields in the next window (steps 3 through 6) are pre-populated for the base search by the KPI template. For more information on how to create and configure KPI base searches, see Create KPI base searches.

Step 3: Filter entities

Filter entities to have more granular control of your KPI at the entity level.

Split by Entity

The Split by Entity option lets you maintain a breakdown of KPI values at the entity level. Split KPI results by a specific entity to monitor each individual entity against which a KPI is running.

You must split KPIs by entity to use the following ITSI features:

Configure the following fields:

Field Description
Split by Entity Enable a breakdown of KPI values at the entity level. The KPI must be running against two or more entities.
Entity Split Field The field in your data to use to look up the corresponding split by entities. The default lookup field for data model searches and ad hoc searches is host. For metrics searches, select a dimension associated with the metric. This field is case sensitive.

When filtering a KPI down to entities, you can split by a field other than the field you are using for filtering the entities (specified in the Entity Filter Field). This allows you to filter to the hosts that affect your service, but split out your data by a different field. For example, you might want to filter down to all of your database hosts but split the metric by the processes running on the hosts.

Entity filtering

Entity filtering lets you specify the entities against which a KPI search will run. Provide an entity filter field to reduce collection of extraneous data.

For example, if you enable entity filtering for a KPI in the Online Sales service, only entities assigned to that service are used to calculate the KPI search metrics.

Field Description
Filter to Entities in Service Enable/disable entity filtering.
Entity Filter Field Specify the field in your data to use to look up the corresponding entities by which to filter the KPI. For metrics searches, select a dimension for the metric. The default field for data model searches, ad hoc searches, and metrics searches is host. This field can be different than the field used for the Entity Split Field.

With the removal of entity alias filtering in version 4.2.x, only the Entity Filter Field determines the entity aliases to use for filtering. ITSI now strictly matches entities against KPI search results using both the alias key and value, whereas before it only used the alias value. The strict entity alias matching also occurs when generating notable events through correlation searches. The entity lookup field must be an actual entity alias field for the match to occur. For more information, see Removed features in Splunk IT Service Intelligence in the ITSI Release Notes

Step 4: Add monitoring calculations

Configure the following KPI monitoring calculations:

Field Description
KPI Search Schedule Determines the frequency of the KPI search.

Avoid scheduling searches at one minute intervals. Running multiple concurrent KPI searches at short intervals can produce lengthy search queues and is not necessary to monitor most KPIs.

Entity Calculation The method for calculating aggregate search results on the entity level. Each entity has its own alert value based on this calculation type. For example, Average or Maximum. These entity values are then aggregated to create the overall value, which is the value displayed for the KPI.


This setting is only applicable if Split by Entity is set to Yes.

Service/Aggregate Calculation The statistical operation that ITSI performs on KPI search results. The correct aggregate calculation to use depends on the type of KPI search. For example, if your search returns results for CPU Load percentage, use Average. if you want a total count of all errors from individual entities, use Sum.
Calculation Window The time period over which the calculation applies. For example, Last 5 Minutes.
Fill Data Gaps with How to treat gaps in your data. This setting affects how aggregate KPI data gaps are displayed in service analyzers, deep dive KPI lanes, glass table KPI widgets, and other dashboards in ITSI populated by the summary index.
  • Select Null values to fill gaps in data with N/A values. Also select the severity level to use for Null values.
  • Select Last available value to use the last reported value in the ITSI summary index. Aggregate KPI data gaps are filled with the last reported aggregate KPI value. Entity-level data gaps are filled with the corresponding entity's last available value.
  • Select Custom value to specify a specific value to use when there is a gap in data. Enter a positive integer.

Filled gap values are not used in the calculations performed for Anomaly Detection and Adaptive Thresholding.

Click Next.

Change the stateful KPIs caching period

Each time the saved search runs for a KPI with Fill Data Gaps with set to Last available value, ITSI caches the alert value for the KPI in the itsi_kpi_summary_cache KV store collection. ITSI uses a lookup named itsi_kpi_alert_value_cache in the KPI saved search to fill entity-level and service-aggregate gaps for the KPI using the cached alert value.

To prevent bloating of the collection with entity and service-aggregate KPI results, a retention policy runs on the itsi_kpi_summary_cache collection using a Splunk modular input. The modular input runs every 15 minutes and removes the entries from cache that have not been updated for more than 30 minutes.

You can change the stateful KPI caching frequency or retention time.

Prerequisites

  • Only users with file system access, such as system administrators, can change the stateful KPI caching frequency and retention time.
  • Review the steps in How to edit a configuration file in the Admin Manual.

Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location.

Steps

  1. Open or create a local inputs.conf file for the ITSI app at $SPLUNK_HOME/etc/apps/SA-ITOA/local.
  2. Under the [itsi_age_kpi_alert_value_cache://age_kpi_alert_value_cache] stanza, adjust the interval and retentionTimeInSec settings.

Filling data gaps with the last reported value occurs for at most 45 minutes, in accordance with the default modular input interval and retention time (15 minutes + 30 minutes). If data gaps for a KPI continue to occur for more than 30 to 45 minutes, the data gaps appear as N/A values.

Step 5: Unit and Monitoring Lag

Configure the following optional settings:

Field Description
Unit The unit of measurement that you want to appear in KPI visualizations. For example, GB, Mbps, secs, and so on.
Monitoring lag The monitoring lag time (in seconds) to offset the indexing lag. Monitoring lag is an estimate of the number of seconds it takes for new events to move from the source to the index. When indexing large quantities of data, an indexing lag can occur, which can cause performance issues. Delay the search time window to ensure that events are actually in the index before running the search. In most cases, don't set this value below 30.

KPIunit.png MonitoringLag.png

Step 6: Enable backfill

Enable backfill to fill the summary index (itsi_summary) with historical raw KPI data. Backfill runs a search in the background that populates the summary index with historical KPI data for a given time range (backfill period) as it would have been populated at a regularly scheduled time by KPI saved searches. In other words, even though the summary index only started collecting data at the start of this week when the KPI was created, if necessary you can use the backfill option to fill the summary index with data from the past month.

Backfill is a one-time operation. Once started, it cannot be redone or undone. For example, if you backfill 60 days of data and then later decide that you want 120 days, you cannot go back and change the backfill period. Think carefully about how many days of data you want to backfill before saving the service.

The backfill option requires you to have indexed adequate raw data for the backfill period you select.

When you enable backfill, you must indicate how many days of data to backfill. You can choose a predefined time range like last 7 days, or select a custom date prior to the current date. If you choose a specific date, the dropdown dynamically updates with the number of days you're backfilling to.

The backfill period is the time range of data that is available after backfill is complete. For example, if you select last 30 days, ITSI fills the summary index with data from the past 30 days. In other words, you now have 30 days of KPI data available.

If you backfill a KPI that uses Last available value to fill data gaps, data gaps are backfilled with filled-in alert values (using the last reported value for the KPI) instead of N/A alert values. If you backfill a KPI that uses a Custom value to fill data gaps, data gaps are backfilled with filled-in alert values (using the custom value provided) instead of N/A alert values. See Step 4: Add monitoring calculations.

You must save the service to initiate the backfill. A message appears in Splunk Web that informs you when the backfill is complete.

ITSI supports a maximum of 60 days of data in the summary index. Therefore, after you configure backfill, you will see one of the following messages:

  • "'Backfill is not available"' - More than 60 days of summary index data already exists.
  • "Backfill has been configured for last <#> days of data" - The backfill job is configured but hasn't run yet. This might be because the service has not been saved yet.
  • "Backfill completed for last <#> days" - Backfill has completed successfully. This message only shows up until a total of 60 days of data is in the summary index, then it changes to "Backfill is not available".

Step 7: Set thresholds

Severity-level thresholds determine the current status of your KPI. When KPI values meet threshold conditions, the KPI status changes, for example, from high (yellow) to critical (red). The current status of the KPI is reflected in all views across the product, including service analyzers, glass tables, and deep dives.

Manually configure threshold values for your KPIs using the threshold preview window. Alternatively, you can apply threshold time policies, which automatically adapt threshold values based on day and time. See Create KPI threshold time policies in ITSI.

ITSI supports two types of KPI severity-level thresholds: Aggregate thresholds and per-entity thresholds. You can configure adaptive thresholds for aggregate thresholds but not per-entity thresholds.

After you configure KPI thresholds, you can configure alerting on a single KPI so you can be alerted when aggregate KPI threshold values change. ITSI generates notable events in Episode Review based on the alerting rules you configure. For information, see Receive alerts when KPI severity changes in ITSI.

Set aggregate threshold values

Aggregate thresholds are useful for monitoring the status of aggregated KPI values. For example, you might apply aggregate thresholds to monitor the status of KPIs that return the total number of service requests or service errors, based on a calculation that uses the stats count function.

  1. Click Aggregate Thresholds.
  2. Click Add threshold to add a range of severity-level thresholds to the threshold preview graph.
    AggThresholds.png
  3. Click Finish.

Set per-entity threshold values

Per-entity thresholds are useful for monitoring multiple separate entities against which a single KPI is running. For example, you might have a KPI, such as Free Memory %, that is running against three separate servers. Using per-entity thresholds, you can monitor the status of Free Memory % on each individual server.

Adaptive thresholding cannot be used on a per-entity basis.

Prerequisites

To use per-entity thresholds, a KPI must be split by entity. See "Step 3: Filter entities" above.

Steps

  1. Click Per Entity Thresholds.
  2. Click Add threshold to add a range of severity-level thresholds to the threshold preview graph. Optionally, if you want to use the same values as the aggregate thresholds, click Apply values to copy those threshold values over.
    The threshold preview shows a separate search results graph for each entity that the KPI is running against.
    PerEntityThreshold.png
  3. Click Finish.
PREVIOUS
Overview of configuring KPIs in ITSI
  NEXT
Set KPI importance values in ITSI

This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.2.1, 4.2.2, 4.2.3, 4.3.0, 4.3.1, 4.4.0, 4.4.1


Comments

Hi Franco, sorry it took so long for me to respond. Thanks a lot for the feedback. I'm not sure how or when this setting was omitted. I made all of the fixed you suggested.

Esnyder splunk, Splunker
December 9, 2019

Hi, there is an error/omission in the section "Step 4: Add monitoring calculations"
1. the "Entity Calculation" option is missing from the table
2. the "Service/Aggregate Calculation" , the description says "If your search returns a count, such as number of errors, then you would want to use Count."
This is not correct, since there. is no "Count" option in the "Service/Aggregate Calculation", but only "Distinct Count". In addition, if you want the total of all counts from individual entities, this option should be set to "Sum"

Thank you,

Franco

Fposchetto splunk, Splunker
September 3, 2019

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters