Create KPI base searches in ITSI
KPI base searches let you share a search definition across multiple KPIs. You can create base searches to consolidate multiple similar KPIs, reduce search load, and improve search performance. For example, if you have similar ad hoc searches whose only difference is an entity or threshold field, you can consolidate these searches into a single base search definition and achieve better search performance.
ITSI module base searches
ITSI includes several pre-configured KPI base searches based on ITSI modules that you can use with your services. The titles of these base searches begin with "DA-ITSI". KPI base searches that come with ITSI modules are read-only and cannot be modified or deleted. To customize a base search that comes from an ITSI module, clone the base search, then perform your edits on the clone.
Service templates and base searches
Service templates use base searches for their KPIs. When a service template is created from a service, all of the KPIs in the service are imported into the template. Any service KPIs that use ad hoc searches, data model searches, or metrics searches are converted into base searches. These base searches are listed on the KPI Base Searches lister page and are available to use for KPIs in any service, just like any other base search. Base searches that are created for service template KPIs use the following naming standard:
<service name>:<KPI name>_<last 8 digits of KPI ID>.
Create a new base search
You can create new KPI base searches, then use those base searches to build KPIs in the configure services workflow. You can create a base search using an ad hoc search or a metrics search. See Overview of metrics in the Splunk Enterprise Metrics manual for information about Splunk metrics.
- You must have write permissions to the Global team to create a KPI base search. All KPI base searches exist in the Global team.
- You must have the
write_itsi_kpi_base_searchcapability to create a KPI base search. The
itoa_team_adminroles have this capability by default.
- Click Configure > KPI Base Searches.
- Click Create KPI Base Search.
- Enter a title for your base search. For example,
CPU load base search.
- (Optional) Enter a description for your base search. For example,
This base search can be used to build KPIs for CPU metrics.
You cannot change the Team field because all KPI base searches exist in the Global team. All users have read access to the Global team.
- Click Create.
- Configure your base search:
Field Description Search Select either Ad hoc Search or Metrics Search.
- For an Ad hoc Search, provide the source ad hoc search. For example,
- For a Metrics Search, select the metrics index to use and the metric. If you do not have a metrics index, you will see the message: "No metrics found".
KPI Search Schedule The frequency of the search (Every 1, 5, or 15 minutes).
Avoid scheduling searches at one minute intervals. Running multiple concurrent KPI searches at short intervals can produce lengthy search queues and is not necessary to monitor most KPIs.
Calculation Window The time period over which the search applies. ( Last 1 min, 5 min, 15 min, or 24 Hours). Monitoring Lag The monitoring lag time (in seconds) to offset the indexing lag. Monitoring lag is an estimate of the number of seconds it takes for new events to move from the source to the index. When indexing large quantities of data, an indexing lag can occur, which can cause performance issues. Delay the search time window to ensure that events are actually in the index before running the search. In most cases, don't set this value below 30. Split by Entity Select Yes to maintain a breakdown of KPI values on the entity level. Entity Split Field Specify the field in your data to use to look up the corresponding split by entities. The default lookup field for data model searches and ad hoc searches is
host. For metrics searches, select a dimension associated with the metric. This field is case sensitive.
When filtering a KPI down to entities, you can split by a field other than the field you are using for filtering the entities (specified in the Entity Filter Field). This allows you to filter to the hosts that affect your service, but split out your data by a different field. For example, you might want to filter down to all of your database hosts but split the metric by the processes running on the hosts.
Filter to Entities in Service Select Yes to filter the search based on the entity alias.
To filter to entities in a service, the service must have associated entities.
Entity Filter Field Specify the field in your data to use to look up the corresponding entities by which to filter the KPI. For metrics searches, select a dimension for the metric. The default field for data model searches, ad hoc searches, and metrics searches is
host. This field can be different than the field used for the Entity Split Field.
Entity Alias Filtering The entity alias that you want to use as a filter. This filters out all aliases from your search, except the specified alias.
The Entity Alias Filtering field will be removed in the next major version of ITSI. See Entity Alias Filtering field in the Removed features in Splunk IT Service Intelligence section of the Release Notes for information on what you need to do to prepare.
- For an Ad hoc Search, provide the source ad hoc search. For example,
- Click Add Metric. The Add Metric modal appears. You can add multiple metrics to your base search. Each metric defines a threshold field (for ad hoc searches only) and the calculation method used to aggregate KPI search results on the entity and service level.
- Configure your metric.
Field Description Title The name of the metric. For example,
CPU load percent
Threshold Field (Ad hoc search type only) This is the field in your data that the KPI aggregates and monitors. For pure counts, use
Unit The type of measurement that the KPI calculates. For example, %, MB, and so on. Entity Calculation Sets the calculation method for calculating aggregate search results on the entity level if Split by Entity is set to Yes. Service/Aggregate Calculation The statistical operation that ITSI performs on KPI search results. The correct aggregate calculation to use depends on the type of KPI search. For example, if your search returns results for CPU Load percentage, you could use
Average. If your search returns a count, such as number of errors, then you would want to use
Fill Data Gaps with Select how you would like to treat gaps in your data. This affects how KPI data gaps are displayed in service analyzers, deep dive KPI lanes, glass table KPI widgets, and other dashboards in ITSI populated by the summary index.
- Select Null values to fill gaps in data with N/A values. Also select the severity level to use for Null values in the Threshold level for Null values dropdown.
- Select Last available value to use the last reported value in the ITSI summary index. For aggregate level KPIs, service aggregate data gaps are filled with the last reported aggregate KPI value. For entity level KPIs, entity data gaps are filled with the corresponding entity's last available value. After the entity gaps have been filled, the service aggregate result is calculated for the KPI. See How filling data gaps with last reported value works for more information.
- Select Custom value to specify a specific value to use when there is a gap in data. Enter a positive integer.
- Click Add.
This adds the metric to the list of metrics defined for your base search. When you build a KPI from a base search, you can select one and only one metric for the KPI.
- Click Save.
You can now use the base search to build KPIs in the configure services workflow.
If you delete a base search, any service KPIs that use the base search are converted to ad hoc searches. You cannot delete a base search that is being used by a KPI in a service template. You must select a different base search for any service template KPIs that use it before you can delete it. Additionally, you cannot delete a metric that is being used by a base search in a service template.
Build new KPIs from a base search
You can use KPI base searches to build new KPIs. Each KPI that you build is linked to the base search. If you edit and save a base search, those changes are propagated to all linked KPIs.
KPI base searches contain metric specifications. The metric specification for ad hoc base searches includes a threshold field. The metric specification for both ad hoc base searches and metrics base searches contains a method of calculation for aggregate search results at the service and entity level. When you apply a base search to a new KPI, you must select a metric specification from the base search to complete the new KPI search definition.
For example, if you want to create a new KPI to measure CPU load, you might select a metric specification from the base search that contains
cpu_load_percent as the threshold field and
average as the calculation method.
To build a KPI from a KPI base search:
- Select Configure > Services and select a service.
- In the KPIs tab, select New > Generic KPI.
- Enter a title and description. Click Next.
- For KPI Source select Base Search.
- In the Base Search dropdown, select the base search for your new KPI. You can choose from base search templates provided by ITSI modules, or from your own custom base searches.
- Select a metric from the Metric menu. Click Next.
The Entities page appears. All fields are populated from the selected base search. Click Next.
The Calculation page appears. All fields are populated from the selected base search. Click Next.
The Optional Setup page appears. All fields are populated from the selected base search, with the exception of Enable Backfill.
- Select the Enable Backfill check box, then select a Backfill Period (optional). Click Next.
- Set appropriate severity-level thresholds for the KPI. Click Finish.
The new KPI is created and appears in the list of KPIs for the service.
To unlink a KPI from the base search, edit the KPI and change the search type to adhoc, then save the KPI. This lets you use KPI base searches as a starting point for new KPIs.
KPI base search performance considerations
The performance of KPI base searches (the amount of time it takes to run the search) is dependent on the following factors:
- The number of KPIs that use the base search.
- The number of services that contain KPIs that use the base search.
- The number of entities matching service entity rules.
Most of the KPI base searches delivered with ITSI are configured to run every minute. Based on testing on a system with 32 cores and 16 GB of memory, a single KPI base search can support up to 5,000 KPIs with 15 entities matched by service entity rules reasonably well.
In general, a KPI base search can support fewer KPIs with many entities or many KPIs with fewer entities. It is not advised to use a single KPI base search for both a high number of KPIs and a high number of entities. As the number of services or matching entities goes up, the search runtime also increases.
You can check the runtime for your KPI base searches on the Activity > Jobs page. The runtime is the actual time it takes to run the search. Check the KPI search schedule (or frequency) of the KPI base search: every minute, every 5 minutes, or every 15 minutes. If a KPI base search is scheduled to run every minute, and the runtime of that search is longer than 1 minute, the search is taking too long to run. To reduce the search runtime, you need to reduce the number of KPIs using the KPI base search, the number of services that have the KPIs, or the number of entities for each service accordingly. The easiest solution is to clone the KPI base search and use the cloned base search for some of the KPIs.
Increase write search result limit
Search results are processed, created, and written to the itsi_summary index via an alert action. The default limit on the number of rows that can be written is 50,000 as specified in the
$SPLUNK_HOME/etc/system/default/limits.conf file. You can increase this limit if necessary.
Calculate the number of the result rows generated by a shared base search using the following formula:
<number of services> x <number of KPIs in each service> x <number of entities per service entity rule> + <number of services> x 2 (one for the service aggregation result, one for the service maximum result)
For example, for 500 services with 10 KPIs in each service and 15 matching entities, the expected number of result rows is:
500 x 10 x 15 + 500 x 2 = 76,000 rows
If the number of result rows expected is more than 50,000, the results will be truncated. As a result, ITSI will display incorrect KPI values.
If you believe you are running into this limitation, create a new
limits.conf file in the
$SPLUNK_HOME/etc/apps/SA-ITOA/local directory and add the following stanza and setting:
[scheduler] max_action_results = 1000000
Set the value for
max_action_results to a number higher than 50,000. In the example above it is set to 1,000,000.
Increase the KV store bulk get limit
The KPI base search tries to get all the relevant services from the KV store internally for thresholding related operations. When a KPI base search is attached to a lot of services, the bulk get might reach the KV Store bulk get size limit (the default limit is set to 500MB).
As a guideline, for one service with 20 fully populated KPIs in which all KPIs have custom thresholds with time policies configured, as well as cohesive anomaly detection configured, the size is roughly 0.8 MB in the KV store.
If you have a large number of services containing a lot of KPIs and metadata, it is recommended to increase the KV store bulk get limit in
$SPLUNK_HOME/etc/apps/SA-ITOA/local/limits.conf. Increase the
max_size_per_result_mb value as necessary.
[kvstore] # The maximum size, in megabytes (MB), of the result that will be returned for a single query to a collection. # ITSI requires approximately 50MB per 1,000 KPIs. Override this value if necessary. # Default: 500 MB max_size_per_result_mb = 500
Detect anomalous KPI behavior in ITSI
Manage services in bulk with service templates in ITSI
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.0.4, 4.1.0, 4.1.1, 4.1.2, 4.1.5