How service health scores work in ITSI
ITSI generates a health score for each service that you create. The health score is a good indicator of the status of a service and is a useful metric to display in Service Analyzer, glass tables, and deep dives. A decline in service health score can be the first sign of an issue that might lead to an outage. ITSI continuously monitors and updates service health scores.
Service health score calculations
Service health scores range from 0 to 100, with 0 being most critical and 100 being most healthy. The health score calculation is based on the current severity level of service KPIs (Critical, High, Medium, Low, and Normal) and the weighted average of the importance values of all KPIs in a service. For more information, see Set KPI importance values in ITSI.
Info severity level is not included in the service health score calculation.
ITSI does not directly use KPIs or health scores of dependent services to calculate a service's health score. Service health scores are calculated based on the
score_contribution value for each severity level. Score contribution values are defined in threshold_labels.conf. Do not modify these values.
For example, a service contains 2 KPIs. One KPI is Critical, so the
score_contribution value is 0. The other KPI is Normal, so the
score_contribution value is 100. Assuming both KPIs have the same importance values, the service health score will be 50.
- N = count of KPIs
- G = importance value of one KPI
- K = the score contribution of the KPI (Normal=100, Low=70, Medium=50, High=30, Critical=0)
The service health score is calculated as follows:
Service health score = (100 ∗ 10/22) + (70 ∗ 7/22) + (30 ∗ 5/22) = 45.45 + 22.27 + 6.81 = 74.53
Impact of per-entity thresholds on service health scores
When a KPI is split by entity, if any entity has a severity level that's worse than the service aggregate severity, the service health score is impacted. K in the equation above represents the score contribution of a KPI. However, if the KPI is split by entity, the worst entity is taken as the score contribution. Therefore, while the aggregate KPI score might be 100 (Normal), one of the entities within that KPI might be 30 (High), so the overall score contribution of that KPI will be 30.
In some cases, entity severity contributions can cause the overall service health score to change significantly, while the aggregate KPI severity level changes only marginally or not at all. For example, if you have a CPU % utilization KPI that is running against three entities, and two of those entities show normal severity, while the third shows critical, the overall service health score might show critical, while the aggregate KPI severity level remains normal.
For more information about per-entity thresholds, see Set per-entity threshold values in the KPI configuration workflow.
Impact of service dependencies on service health scores
Any service dependencies that you add to a service will impact the service health score, based on the importance value that you set for dependent service KPIs. For more information, see Set importance values for service dependencies in this manual.
Clone a service in ITSI
Overview of configuring KPIs in ITSI
This documentation applies to the following versions of Splunk® IT Service Intelligence: 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.0.4, 4.1.0, 4.1.1, 4.1.2, 4.1.5, 4.2.0, 4.2.1, 4.2.2, 4.2.3, 4.3.0, 4.3.1, 4.4.0, 4.4.1, 4.4.2