About dashboards in the Content Pack for ITSI Monitoring and Alerting

The Content Pack for ITSI Monitoring and Alerting includes the dashboards described in this section.

ITSI Service and KPI Severity Analytics

Use this dashboard to identify Services and KPIs which are excessively unhealthy. This could be indicative of a Service or KPI whose thresholds and KPI importance settings need to be tuned. For more information about this dashboard please review the following tech talk: https://events.splunk.com/Tuning-KPI-thresholds.

ITSI Service and KPI Threshold Analytics

Use this dashboard to review the severity of past KPI aggregate values and evaluate threshold configurations. KPIs with excessive amounts of non-normal severities might reflect poorly configured thresholds and should be tuned before enabling alerting on this Service or KPI. For more information about this dashboard please review the following tech talk: https://events.splunk.com/Tuning-KPI-thresholds.

ITSI Alert and Episode Volume Trend Analysis

Use this dashboard to review the volume of incoming alerts and notable events to assess the real-time health of the environment by comparing current alert volumes against historical volumes. When alert volumes rise significantly higher than historical norms, the system detects and marks these alert storms. Further triage of the alerts in the alert storm can be done using the ITSI Alert and Episode Field Values Analysis dashboard.

ITSI Alert and Episode Field Values Analysis

When alert volumes are high, you can use this dashboard to review the values of important alert fields to understand which alerts might be contributing to an unhealthy environment. For example, by analyzing this dashboard, you may be able to quickly determine that a significant volume of alerts is coming from a single KPI or a single host. Unusual and lopsided distributions of field values can be easily discovered and will help you focus your subsequent investigation.

ITSI Event and Incident Operations Posture

Use this dashboard to understand overall alert and episode handling trends, such as What is the Mean Time to Respond (MTTR) and Mean Time to Acknowledge (MTTA) over time? And which services, alert groups, devices and alert signatures have been the noisiest? This dashboard is especially useful for Operations leaders who are trying to understand longer-term pain points and organizational efficiency.

Related answers from Splunk Community

About dashboards in the Content Pack for ITSI Monitoring and Alerting

ITSI Service and KPI Severity Analytics

ITSI Service and KPI Threshold Analytics

ITSI Alert and Episode Volume Trend Analysis

ITSI Alert and Episode Field Values Analysis

ITSI Event and Incident Operations Posture

Comments

About dashboards in the Content Pack for ITSI Monitoring and Alerting

Was this topic useful?