Splunk® IT Service Intelligence

Administer Splunk IT Service Intelligence

Download manual as PDF

Splunk IT Service Intelligence version 2.5.0 is available to Splunk Cloud subscribers only.
This documentation does not apply to the most recent version of ITSI. Click here for the latest version.
Download topic as PDF

Configure ITSI Services

This topic shows you how to configure IT Service Intelligence (ITSI) services. The instructions assume you have already created a service. If you have not yet created a service, see Create Services in this manual.

Service configuration workflow

ITSI service configuration involves these steps:

  • Add entity rules (optional)
  • Add KPIs
  • Add dependent services (optional)


The diagram shows the basic steps involved in configuring an ITSI service. Configure services workflow.png

Add entity rules

Entity rules let you dynamically filter KPI searches based on entity alias matches. You can use entity rules to associate entities with KPIs at the service level, which makes it unnecessary to specify entity identifying fields for each KPI search.

When to add entity rules

Adding entity rules to a service is optional and you can add them at any time. There are many scenarios where entity rules can make it easier to configure your services, including:

  • Match entity ID data not recognized inside Splunk Enterprise (such as mapping a naming scheme to specific devices). For example, your organization might use a server naming convention such as server-01, server-02, and so on. These names do not appear as fields inside Splunk searches. Adding rules that match your entity aliases to your server naming scheme lets you apply KPI searches to those servers.
  • Disambiguate between multiple fields that identify the same machine. (such as a host with multiple IP addresses).

How to set up entity rules

You can set up entity rules to match entities based on entity aliases, info, or entity title. And you can create rules based on multiple AND/OR conditions.

For example, if you want to add entity rules that identify your database servers, and those servers have aliases of host=mysql-01, host=mysql-02, host=mysql-03 and so on, you can add an entity rule such as "host matches mysql*" to identify the servers on which to run the KPI search.

Add entity rules.png

This entity rule matches the host field in Splunk data with your mysql* servers and adds each server to all KPI searches in the service.

Add KPIs

ITSI uses KPI searches to monitor the performance of your IT services. For more information on KPIs, see ITSI concepts and features in this manual.

You must add at least one KPI search to a service to use ITSI. For information on how the number of KPIs can impact performance, see Performance considerations in this manual.

About KPI search properties

When you create a KPI search, you configure a set of search properties. KPI search properties include:

Source search
A search string that you define as the basis for your KPI, using a data model, ad hoc search, or base search.
Threshold field
The specific field in your data that the KPI search monitors and aggregates. For example CPU_load_percent or mem_free.
Entity alias mapping and filtering (optional)
You can map entity aliases to fields in your search data to determine the specific entities to which a KPI search applies. You can also filter entities in or out of a KPI search, and apply a KPI search to multiple entities, enabling comparative analysis of search results on a per entity basis.
Monitoring calculations
The recurring KPI search schedule and statistical operations on search results, including service health score calculations.
Severity-level thresholds
Thresholds that you apply to KPI search results. Severity-level thresholds let you monitor KPI status (normal, low, medium, high, and critical) and set trigger conditions for alerts.

For example, to monitor the CPU load percentage of an entity (machine) in a service, you can create a KPI using an ad hoc base search that returns the value of the field cpu_load_percent at 5 minute intervals over a 5 minute time range, then set a range of severity-level thresholds between 0% and 100%.

Create a KPI search

This section shows you how to create a basic KPI search. These instructions assume that you have already created a service. If not, see Create Services in this manual. You must create a service with at least one KPI to run ITSI.

Step 1. Add new KPI

  1. Select Configure > Services. In the Service viewer, click an existing service.
  2. In the New dropdown menu, chose one of the following two options:
    • Select Generic KPI.
    • Select a KPI template. For example, Application Server: CPU and Memory > Memory Used. KPI templates provide pre-configured KPI source searches, including ad hoc searches and base searches, based on ITSI modules. KPI templates are tailored for specific service monitoring use cases, such as operating systems, databases, web servers, load balancers. virtual machines, and so on.
  3. In Step 1 of the The KPI creation dialog, enter the KPI Title and Description (optional). Click Next.

Step 2. Define source search

When you create a KPI, you must define a source search on which to build the KPI. You can chose from three source search types: data model, ad hoc search, and base search.

Note: Before you define your source search, consider the performance implications for your particular deployment. While data models are suitable for smaller test environments, base searches generally provide best performance in larger production settings. See Create KPI base searches on this page.

Define source search from data model

  1. Configure your data model search.
    Field Description
    KPI source Data Model
    Data Model The data model object, child, and attribute fields. For example, Host Operating System > Memory > mem_used_percent. Note: When you create a KPI search from a data model, the data model object field becomes the threshold field. When you create a KPI search from an ad hoc search, you must manually enter the threshold field.
    Filters (optional) Click Add Filter to add data model filter conditions. Data model filters let you include/exclude search result data based on the filter conditions. For example, the filter condition host Equals ipaddress filters out all values for the data model search field host, except for values that equal ipaddress. Data model filtering can help improve the speed and accuracy of your searches by excluding extraneous data from search results.
  2. Click on Generated Search to preview your KPI search string.
    Use the Generated Search box to view changes that ITSI makes to your search string as you build your KPI. Click anywhere on the Generated Search itself to run the search.
    Generated search.png
  3. Click Next.

Define source search from ad hoc search

  1. Configure your ad hoc search.
    Field Description
    KPI source Ad hoc Search
    Search The ad hoc search string that you create. This is the event gathering search for the KPI.
    Threshold Field The field in your data that the KPI aggregates and monitors. For pure counts use _time.
  2. Click on Generated Search to preview your KPI search string.
  3. Click Next.

Define source search from base search

  1. Configure your base search.
    Field Description
    KPI source Base Search
    Base Search The base search that you want to associate with the KPI. For example, DA-ITSI-OS: Performance.Memory. Base searches provide pre-configured KPI templates built on ITSI modules.
    Metric The metric that you want to associate with the KPI. For example, mem_free_percent.
  2. Click on Generated Search to preview your KPI search string.
  3. Click Next.
    Note: Most fields in the proceeding KPI configuration (steps 3 through 6) are pre-populated for the base search by the KPI template.
    For more information on how to create and configure KPI base searches, see Create KPI base searches on this page

Step 3. Filter entities

Split by Entity

The Split by Entity option lets you maintain a breakdown of KPI values at the entity level. Use Split by Entity to enable monitoring of KPI values for each individual entity against which a KPI is running. KPIs must be split by entity to use these ITSI features:


  1. Configure split by entity.
    Field Description
    Split by Entity Enable/disable a breakdown of KPI values at the entity level. The KPI must be running against two or more entities.
    Entity Lookup Field This is the field in your data that is used to lookup the corresponding split by entities. The default lookup field is host.
  2. Click Next.

Entity alias filtering

Entity alias filtering lets you specify the entities against which a KPI search will run. Apply entity alias filters to reduce collection of extraneous data.

  1. Configure entity alias filtering.
    Field Description
    Filter to Entities in Service Enable/disable entity alias filtering.
    Entity Alias Filtering The specific entity alias that you want to use as a filter. For example, host. This filters out all entity alias values from the KPI search, except those that match the name host. For example, if your service has these two entity aliases:
    host=sr-centos2.sv.splunk.com
    IP= 10.141.20.37
    

    your KPI search will include an OR clause, such as:

    | search Performance.dest=sr-centos2.sv.splunk.com OR dest=10.141.20.37

    When you specify host as an entity alias filter, this limits the KPI search clause to:

    | search Performance.dest=sr-centos.sv.splunk.com

  2. Click Next.
    Note: When filtering on entity alias values, the search is case sensitive.

For more information on entity aliases, see Define Entities in this manual.

Step 4. Add monitoring calculations

  1. Configure KPI monitoring calculations.
    Field Description
    KPI Search Schedule This determines the frequency of the KPI search. Note: Avoid scheduling searches at one minute intervals. Running multiple concurrent KPI searches at short intervals can produce lengthy search queues and is not necessary to monitor most KPIs.
    Service/Aggregate Calculation The statistical operation that ITSI performs on KPI search results. The correct aggregate calculation to use depends on the type of KPI search. For example, if your search returns results for CPU Load percentage, you could use Average. If your search returns a count, such as number of errors, then you would want to use Count.
    Calculation Window This is the time period over which the calculation applies. For example, Last 5 Minutes.
  2. Click Next.

Step 5. Optional setup

  1. Configure optional settings: KPI units, monitoring lag for indexing offset, and data backfill.
    Field Description
    Unit The unit of measurement that you want to appear in KPI visualizations. For example, GB, Mbps, secs, and so on.
    Monitoring lag The monitoring lag time (in seconds) to offset the indexing lag. When indexing large quantities of data, an indexing lag can occur, which can cause performance issues.
    Enable backfill Enables backfill of the KPI summary index (itsi_summary). Requires you to have indexed adequate raw data for the backfill period. Note: You must have 7 days of summary data in the summary index for Adaptive Thresholding to work properly.
    Backfill period The time range of data that is available after backfill is complete.

    Selecting a backfill period initiates the backfill. A message appears in Splunk Web that informs you when the backfill is complete.

  2. Click Next.

Step 6. Set thresholds

Severity-level thresholds determine the current status of your KPI. When KPI values meet threshold conditions, the KPI status changes, for example, from high (yellow) to critical (red). The current status of the KPI is reflected in all views across the product, including Service Analyzers, Glass Tables, and Deep Dives.

You can manually add threshold values for your KPIs one at a time using the threshold preview window. Or apply threshold time policies, which automatically adapt threshold values based on day and time. See Use KPI threshold time policies on this page.

ITSI supports two types of KPI severity-level thresholds: Aggregate thresholds and per entity thresholds.

Set aggregate threshold values

Aggregate thresholds are useful for monitoring the status of aggregated KPI values. For example, you might apply aggregate thresholds to monitor the status of KPIs that return the total number of service requests or service errors, based on a calculation that uses the stats count function.

  1. Click the Aggregate Thresholds button.
  2. Click Add threshold to add a range of severity-level thresholds to the threshold preview graph.
    Aggregate threshold.png
  3. Click Finish.

Set per entity threshold values

Per entity thresholds are useful for monitoring multiple separate entities against which a single KPI is running. For example, you might have a KPI, such as Free Memory %, that is running against three separate servers. Using per entity thresholds, you can monitor the status of Free Memory % on each individual server. Note: To use per entity thresholds, a KPI must be split by entity.

  1. Click the Per Entity Thresholds button.
  2. Click Add threshold to add a range of severity-level thresholds to the threshold preview graph.
    The threshold preview shows a separate search results graph for each entity that the KPI is running against.
    Per entity threshold.png
  3. Click Finish.

KPI threshold time policies

ITSI lets you create time policies that automatically adjust threshold values over time. You can use time policies to accommodate normal variations in usage across your services and improve the accuracy of KPI and service health scores.

For example, if your organization's peak activity is during the work week, you might want to create a time policy that takes into account different levels of usage during work hours, off hours, and weekends.

You can only have one active time policy at any given time. When you create a new time policy, the previous time policy is overwritten and cannot be recovered.

Create a time policy with adaptive thresholding

ITSI provides a set of 32 default thresholding templates that you can use to build your time policies. You can select templates with different time block combinations, such as work hours, off hours, weekends; AM/PM; 3 hour block; 2 hour block: and so on. 1 hour is the minimum time block for which you can create a policy.

Templates are static or adaptive. Static templates let you create time policies that do not change after you configure them. Adaptive templates let you create time polices that generate thresholds dynamically and update daily based on changes in your data.

To create a KPI threshold time policy using adaptive thresholding:

  1. In the KPIs list, click on the specific KPI for which you want to set a threshold time policy.
  2. Open the Thresholding panel.
  3. Select the Use Thresholding Template radio button. Then select an adaptive thresholding template, such as "3-hour blocks every day (adaptive/stdev)." (Selecting an adaptive template automatically enables Adaptive Thresholding and Time Policies.)
    The Preview Aggregate Thresholds window opens. ITSI backfills the preview with aggregate data from the past 7 days. If there is data in the summary index it loads the summary data, otherwise it loads raw data. Loading raw data can take some time.
    Adaptive thresholding.png
  4. Open the Configure Thresholds for Time Policies panel.
  5. Select a time policy block, then click Apply Adaptive Thresholding. Make sure to use the correct Policy type, such as Standard deviation. (Policy types for adaptive thresholds include Standard deviation, Quantile, and Range.)
    Apply adaptive thresholding.png
    ITSI generates your new threshold time policy, which you can view in the Preview Aggregate Thresholds window. Adaptive thresholds update once daily around midnight.
  6. Click Save.

TIme polices cannot overlap. If you attempt to create a policy that overlaps with another policy a validation error appears.

Apply time zone offsets

KPI threshold settings support one-hour minimum increments. As a result, users located in time zones that have non-hourly offsets of 30 or 45 minutes (such as (GMT+05:30) Chennai, Kolkata, Mumbai, New Delhi) cannot accurately apply time policies created by other users in hourly time zones. ITSI will round down the time block configuration to the nearest hour in UTC time zone upon save.

ITSI lets you apply a time zone offset to ITSI service objects using the kvstore_to_json.py script. For detailed instructions on applying time zone offsets, see Time zone offset operations (mode 3) in this manual.

For more information on time zones in ITSI, see Time zone handling in this manual.

Create new thresholding templates

ITSI lets you add new time policies or modify existing time policies and save them as new thresholding templates.

  1. In the service configuration workflow, click on the KPI for which you want to create the thresholding template.
  2. Click to open the Thresholding panel.
  3. Select the Set Custom Thresholds radio button.
  4. Set Enable Time Policies to Yes.
  5. Click to open the Configure Threshold for Time Policies panel.
  6. Select an existing time policy or add a new time policy.
  7. Click Save as Template.
  8. Type in a Title and Description (optional) for your new thresholding template. Click Save.
    You can now apply your custom thresholding template to any number of KPIs.

Edit thresholding templates

ITSI provides a separate editing page where you can modify any existing thresholding templates. For example, you might want to change the days and hours for a specific time policy, change the policy type, or apply adaptive thresholding.

  1. Click to open the Thresholding panel.
  2. Select the Use Thresholding Template radio button.
  3. Click Edit Template.
  4. Make your modifications to the thresholding templates. Click Save.

Step 7. Set KPI importance value

After you create your KPI, you must assign the KPI an importance value. ITSI uses the importance value, along with the KPI severity level, to calculate a service health score.

Importance values range from 0 to 11. KPI importance values from 1-11 are included in the health score calculation, with 1 being the least important and 11 being the most important. The greater the KPI importance value, the greater the impact that KPI has on the service health score.

ITSI considers KPIs that have an importance value of 11 as a special case that represents a "minimum health indicator" for the service. When a KPI with an importance value of 11 reaches the critical state, the overall health score for the service turns critical, regardless of the status of other KPIs in the service.

KPIs with an importance value of 0 are not included in the health score calculation.

  1. At the top of the KPIs list, click Service Health.
    This opens the Service Health window, which shows the importance value of each KPI, along with the current composite health score of the service.
  2. Use the slider to set the Importance value for your new KPI.
    KPI importance value.png
  3. (optional) Click on the severity-level preview to see the impact that different severity-levels have on the composite health score given the KPI importance value. Use this feature to help fine-tune your KPI importance values. The severity-level setting is for preview purposes only and has no impact on actual severity-level thresholds or service health scores.

How service health scores work

ITSI generates a health score for each service that you create. The health score is a good indicator of the status of a service and is a useful metric to display in Service Analyzer, Glass Table, and Deep Dive visualizations. A decline in the service health score value can be the first sign of an issue that might lead to an outage. ITSI continuously monitors and updates service health scores.

Service health score calculations

Service health scores range from 0 to 100, with 0 being most critical and 100 being most healthy. The health score calculation is based on the current severity level of service KPIs (Critical, High, Medium, Low, and Normal) and the weighted average of the importance values of all KPIs in a service. See Set KPI importance value.

The "Info" severity level is not calculated into service health scores.

Impact of per entity thresholds on service health scores

When a KPI is split by entity, if any entity has a severity level (based on entity thresholds) that is worse than the service aggregate severity, the service health score will be impacted. In some cases, this can cause the overall service health score to change significantly, while the aggregate KPI severity level changes only marginally or not at all.

For example, if you have a CPU % utilization KPI that is running against three entities, and 2 of those entities show normal severity, while the third shows critical, the overall service health score might show critical, while the aggregate KPI severity level remains normal.

Impact of service dependencies on service health scores

Any service dependencies that you add to a service will impact the service health score, based on the importance value that you set for dependent service KPIs. For more information, see Set importance values for service dependencies.

Add service dependencies

After you configure a service, you can add other services as service dependencies. Adding service dependencies can help you detect if one service is having a negative impact on another service, and can be useful in performing root cause analysis.

For example, you might have a web service that has a dependency on a database service. By adding the database service as a dependency, you can monitor the impact that the database service is having on the web service.

When you add a service dependency, you select the specific KPIs from the dependent service that you want ITSI to include in the health score calculation of the primary service.

  1. Click the Service Dependencies tab.
  2. Click Add Dependencies.
  3. Select the service(s) that you want to add as a dependency.
    This opens the list of KPIs associated with that service.
  4. Select the dependent KPIs that you want to monitor as part of the primary service health score.
    Select dependency KPIs.png
  5. Click Save.
    ITSI adds the selected KPIs to the list of service dependencies, and includes the selected KPIs in the primary service health score.

Set importance values for service dependencies

ITSI lets you adjust the importance value of dependent service KPIs. Set the importance value to give dependent service KPIs appropriate weighting in health score calculations and to generate health score values that more accurately reflect the impact of the dependent service on the primary service.

You can add both service health scores and regular KPIs as dependent KPIs. By default, dependent service health scores have an importance value of 11. Regular KPIs have the same importance value set for the KPI in the original service.

  1. In the KPI list for the primary service, click Service Health.
    The Service Health window opens. The window now includes a second table showing the importance values of Dependent KPIs.
  2. Use the sliders to set the importance values for the dependent KPIs.
  3. (optional) Adjust the the Simulated Severity levels to preview the impact that the dependent KPI severity levels have on the composite health score of the parent service. Dependent kpi importance value.png
  4. Click Save.

When a dependent KPI with an importance value of 11 reaches critical, the health score of the primary service will read critical with a value of 0, regardless of the status of other KPIs.

PREVIOUS
Create ITSI Services
  NEXT
Create KPI base searches

This documentation applies to the following versions of Splunk® IT Service Intelligence: 2.5.0 Cloud only, 2.5.1 Cloud only, 2.5.2 Cloud only


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters