Heartbeat Check 🔗
Heartbeat Check alerts when a signal has not reported for some time. This might happen because a host is down or stopped reporting a particular metric. This condition is often used in tandem with another detector, to ensure that a signal being analyzed is reporting.
Note
Only active metric time series are monitored, so this condition doesn’t trigger an alert for a host that has never sent a metric. It triggers only if a host has sent metrics and then stops sending metrics.
Examples 🔗
You have a detector that alerts you when the minimum number of logins being handled by each host goes below a specified value. If a host stops reporting, that detector isn’t triggered if there is a problem. The Heartbeat Check condition notifies you if a host stops reporting, or if all hosts in a group stop reporting.
Settings 🔗
Parameter |
Values |
Notes |
---|---|---|
Hasn’t reported for |
Integer >= 1, followed by time indicator (s, m, h, d, w). For example, 30s, 10m, 2h, 5d, 1w. |
How long it’s been since the signal last reported. Longer time periods result in lower sensitivity and potentially fewer alerts. If you specify a value for Group by (below), how long it’s been since all members of the group stopped reporting. |
(optional) Group by |
Dimension or property chosen from dropdown menu |
Use a dimension or property when you want alerts to be based on a specified unit. For example, if you group by |
Further reading 🔗
Parameters |
Remarks |
---|---|
Signal (heartbeat metric) |
If you want to avoid triggering alerts based on specific conditions (for example, excluding a test realm, or excluding hosts known to have been terminated), apply filters to the signal before configuring the alert condition. Make sure that the Extrapolation policy is Null (the default) for all signals that influence the heartbeat metric. If it is not Null, Splunk Infrastructure Monitoring extrapolates values for missing data points, and the alert isn’t triggered as expected. Extrapolation policy is specified in the plot configuration panel for each signal. To learn more, see Set options in the plot configuration panel. |
Hasn’t reported for |
To avoid flappy alerts that are triggered due to minor, short-lived delays in sending metrics, set this parameter to be significantly larger than the native resolution of the signal (how often the signal is reporting). For example, if the signal reports once a minute, setting this parameter to 10 minutes means that the alert isn’t triggered until 10 data points have not reported. |