Set up alerts for the splunkd health report
The splunkd
health report generates alerts for all features in the health status tree. When a feature indicator meets the threshold condition, the feature's health status changes, for example from green to red, and an alert fires. Use health status alerts to maintain visibility into the health of your deployment, whether or not you are logged into Splunk Web.
You can configure health report alerts as follows:
- Enable/disable alerts on the global, feature, or indicator level.
- Send alert notifications via email or PagerDuty.
- Set the health status color (yellow or red) that triggers an alert.
- Set a minimum duration that must elapse between alerts.
You can configure health report alerts by directly editing health.conf
or querying the server/health-config
endpoint.
Disable health report alerts
Alerts are enabled by default for all features in the splunkd health report. You can disable alerts at the global, feature, or indicator level. Disabling alerts at the global level overrides enabled alerts at the feature level. Likewise, disabling alerts at the feature level overrides enabled alerts at the indicator level.
Disabling alerts is useful for reducing noise from non-critical features and minimizing false positives when performing maintenance tasks.
Disable alerts using health.conf
To disable alerts for all features in the splunkd health report:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the
health_reporter
stanza, setalert.disabled = 1
. For example:[health_reporter] full_health_log_interveral = 30 suppress_status_update = 600 alert.disabled = 1
To enable alerts for all features in the splunkd health report, set
alert.disabled = 0
.
To disable alerts for a single feature:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the stanza for the particular feature, set
alert.disabled = 1
. For example:[feature:indexers] ... indicator:missing_peers:yellow = 1 indicator:missing_peers:red = 1 alert.disabled = 1
To enable alerts for a feature, set
alert.disabled = 0
.
To disable alerts for a single feature indicator:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the stanza for a particular feature, set
alert:<indicator_name>.disabled = 1
. For example, in the following stanza, alerting for the indicators2s_connections
is disabled:[feature:s2s_autolb] ... indicator:s2s_connections:yellow = 20 indicator:s2s_connections:red = 70 alert:s2s_connections.disabled = 1
To enable alerts for an indicator, set
alert:
..disabled = 0
Disable alerts using REST endpoint
To disable alerts for features and indicators, send a POST request to server/health-config/{feature_name}
. For example, to disable alerts for the batchreader
feature on the instance you are monitoring run the following command:
curl -k -u admin:pass https://<host>:<mPort>/services/server/health-config/feature:batchreader -d alert.disabled=1
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
To access server/health-config
endpoints, a role must have the edit_health
capability.
Set up health report alert actions
You can set up alert actions that run when an alert fires, such as sending alert notifications via email, mobile device, or PagerDuty.
Alert actions apply on the global level only. Multiple alert actions for the same action type are not supported. For example, you cannot have multiple email actions and multiple PagerDuty actions.
Before you can send health email alert notifications, you must configure email notification settings in Splunk Web. For instructions, see Email notification action in the ''Alerting Manual''.
Set up email notifications in health.conf
To set up email alert notifications:
- Edit
SPLUNK_HOME/etc/system/local/health.conf
- Add the
[alert_action:email]
stanza and specify the recipients. For example:[alert_action:email] disabled = 0 action.to = <recipient@example.com> action.cc = <recipient_2@example.com> action.bcc = <other_recipients@example.com>
Set up PagerDuty notifications in health.conf
Before you can send alert notifications to PagerDuty, you must install the PagerDuty App from Splunkbase. You must also add a new service to your PagerDuty integration, and copy the integration key. For more information, see PagerDuty documentation.
To set up PagerDuty alert notifications:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
[alert_action:pagerduty]
stanza and specify the integration key. For example:[alert_action:pagerduty] disabled = 0 action.integration_url_override = <integration key>
For more information, see health.conf.example.
Set up alert notifications using REST
To set up alert notifications, send a POST request to server/health-config/{alert_action}
. For example, to set up an email alert notification:
curl -k -u admin:pass https://localhost:8089/services/server/health-config/alert_action:email -d action.to=admin@example.com -d action.cc=admin2@example.com
For endpoint details, see server/health-config/{alert_action} in the REST API Reference Manual.
Set the alert threshold color
You can set the threshold color that triggers an alert. Possible alert threshold values are yellow or red. If the threshold value is yellow, an alert fires for both yellow and red. If the value is red, an alert fires for red only. The default alert threshold value is red.
Set the alert threshold color in health.conf
To set the alert threshold color on the global or feature level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
alert.threshold_color
setting to the[health_reporter]
or[feature:<feature_name>]
stanza. For example:[feature:replication_failures] ... alert.threshold_color = yellow indicator:replication_failures:red = 10 indicator:replication_failures:yellow = 5
To set the alert threshold color on the indicator level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf.
Add the
alert:<indicator name>.threshold_color
setting to the feature stanza. For example:[feature:replication_failures] ... indicator:replication_failures:red = 10 indicator:replication_failures:yellow = 5 alert:replication_failures.threshold_color = yellow
Alert threshold color settings at the indicator level override alert threshold color settings at the feature level.
Set the alert threshold color using REST
To set the alert threshold color for a feature or indicator, send a POST request to server/health-config{feature_name}
. For example, to set the alert threshold color for the Replication Failures feature:
curl -k -u admin:pass https://localhost:8089/services/server/health-config/feature:replication_failures -d alert.threshold_color=yellow
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
Set minimum duration between alerts
You can set the amount of time an unhealthy health status persists before an alert fires using the alert.min_duration_sec
setting. You can use this setting to help reduce noise from feature health status changes that might be rapidly flipping between states, for example, between green and yellow or yellow and red.
Set minimum duration between alerts in health.conf
To set the minimum duration between alerts on the global or feature level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
.
- Add the
alert.min_duration_sec
setting to the [health_reporter]
or [feature:<feature_name>]
stanza. For example:
[feature:replication_failures]
...
alert.min_duration_sec = 600
indicator:replication_failures:red = 10
indicator:replication_failures:yellow = 5
To set the minimum duration between alerts on the indicator level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf.
Add the alert:<indicator name>.min_duration_sec
setting to the [feature:<feature_name>]
stanza. For example:
[feature:replication_failures]
...
indicator:replication_failures:red = 10
indicator:replication_failures:yellow = 5
alert:replication_factor.min_duration_sec = 600
Minimum duration between alerts settings on the feature level override settings on the indicator level.
Set minimum duration between alerts using REST
To set the minimum duration between alerts for a feature or indicator, send a POST request to server/health-config/{feature_name}
. For example, to set the minimum duration between alerts for the Replication Failures feature:
curl -k -u admin:pass https://localhost:8089/services/server/health-config/feature:replication_failures -d alert.min_duration_sec=600
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
Configure the splunkd health report | Set access controls for the splunkd health report |
This documentation applies to the following versions of Splunk® Enterprise: 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9
Feedback submitted, thanks!