Heartbeat Check 🔗

Heartbeat Check alerts when a signal has not reported for some time. This might happen because a host is down or stopped reporting a particular metric. This condition is often used in tandem with another detector, to ensure that a signal being analyzed is reporting.

Note

Only active metric time series are monitored, so this condition doesn’t trigger an alert for a host that has never sent a metric. It triggers only if a host has sent metrics and then stops sending metrics.

Examples 🔗

You have a detector that alerts you when the minimum number of logins being handled by each host goes below a specified value. If a host stops reporting, that detector isn’t triggered if there is a problem. The Heartbeat Check condition notifies you if a host stops reporting, or if all hosts in a group stop reporting.

Settings 🔗

Parameter	Values	Notes
Hasn’t reported for	Integer >= 1, followed by time indicator (s, m, h, d, w). For example, 30s, 10m, 2h, 5d, 1w.	How long it’s been since the signal last reported. Longer time periods result in lower sensitivity and potentially fewer alerts. If you specify a value for Group by (below), how long it’s been since all members of the group stopped reporting.
(optional) Group by	Dimension or property chosen from dropdown menu	Use a dimension or property when you want alerts to be based on a specified unit. For example, if you group by `cluster`, the alert is triggered only if all hosts in a cluster stop reporting. Alternatively, if each time series is associated with only one host, and you want to be alerted when any host has stopped reporting, leave this parameter blank (or group by `host`).

Further reading 🔗

Parameters

Remarks

Signal (heartbeat metric)

If you want to avoid triggering alerts based on specific conditions (for example, excluding a test realm, or excluding hosts known to have been terminated), apply filters to the signal before configuring the alert condition.

Make sure that the Extrapolation policy is Null (the default) for all signals that influence the heartbeat metric. If it is not Null, Splunk Infrastructure Monitoring extrapolates values for missing data points, and the alert isn’t triggered as expected. Extrapolation policy is specified in the plot configuration panel for each signal. To learn more, see Set options in the plot configuration panel.

Hasn’t reported for

To avoid flappy alerts that are triggered due to minor, short-lived delays in sending metrics, set this parameter to be significantly larger than the native resolution of the signal (how often the signal is reporting). For example, if the signal reports once a minute, setting this parameter to 10 minutes means that the alert isn’t triggered until 10 data points have not reported.

This page was last updated on Oct 17, 2024.

Was this topic useful?

Did you know that you can edit this page? Learn how!

Was this documentation topic helpful?

Enter your email address if you would like someone from the documentation team to reply to your question or suggestion.

Please provide your comments here. Ask a question or make a suggestion.

Comment should have a minimum of 5 characters and a maximum of 1,000 characters.

Feedback submitted, thank you! We resolve documentation feedback based on the severity of the issue reported, as well as an assessment of the potential number of customers who might be affected.

If you have a question about using Splunk software, we encourage you to check Splunk Answers or Splunk community Slack to see if similar questions have been answered, or to post your question for others to answer. If you have an active support entitlement and believe that your situation is caused by a product defect, file a support case in the Support portal https://login.splunk.com/page/sso_redirect?type=portal.

We are currently moving to a new documentation site. Expect a delay in responding to your feedback and applying any updates based on your feedback. Thank you for your patience and understanding while we work to bring you an improved documentation experience!