Set up alerts for the splunkd health report
The splunkd
health report generates alerts for all features in the health status tree. When a feature indicator meets the threshold condition, the feature's health status changes, for example from green to red, and an alert fires. Use health status alerts to maintain visibility into the health of your deployment, whether or not you are logged into Splunk Web.
You can configure health report alerts as follows:
- Enable/disable alerts on the global, feature, or indicator level.
- Send alert notifications via email, mobile, VictorOps, or PagerDuty.
- Set the health status color (yellow or red) that triggers an alert.
- Set a minimum duration that must elapse between alerts.
You can configure health report alerts by directly editing health.conf
or querying the server/health-config
endpoint.
Disable health report alerts
Alerts are enabled by default for all features in the splunkd health report. You can disable alerts at the global, feature, or indicator level. Disabling alerts at the global level overrides enabled alerts at the feature level. Likewise, disabling alerts at the feature level overrides enabled alerts at the indicator level.
Disabling alerts is useful for reducing noise from non-critical features and minimizing false positives when performing maintenance tasks.
Disable alerts using health.conf
To disable alerts for all features in the splunkd health report:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the
health_reporter
stanza, setalert.disabled = 1
. For example:[health_reporter] full_health_log_interveral = 30 suppress_status_update = 600 alert.disabled = 1
To enable alerts for all features in the splunkd health report, set
alert.disabled = 0
.
To disable alerts for a single feature:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the stanza for the particular feature, set
alert.disabled = 1
. For example:[feature:indexers] ... indicator:missing_peers:yellow = 1 indicator:missing_peers:red = 1 alert.disabled = 1
To enable alerts for a feature, set
alert.disabled = 0
.
To disable alerts for a single feature indicator:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the stanza for a particular feature, set
alert:<indicator_name>.disabled = 1
. For example, in the following stanza, alerting for the indicators2s_connections
is disabled:[feature:s2s_autolb] ... indicator:s2s_connections:yellow = 20 indicator:s2s_connections:red = 70 alert:s2s_connections.disabled = 1
To enable alerts for an indicator, set
alert:
..disabled = 0
Disable alerts using REST endpoint
To disable alerts for features and indicators, send a POST request to server/health-config/{feature_name}
. For example, to disable alerts for the batchreader
feature on the instance you are monitoring run the following command:
curl -k -u admin:pass https://<host>:<mPort>/services/server/health-config/feature:batchreader -d alert.disabled=1
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
To access server/health-config
endpoints, a role must have the edit_health
capability.
Set up health report alert actions
You can set up alert actions that run when feature health status changes trigger an alert. Alert actions include sending notifications via email, mobile, VictorOps, or PagerDuty. Alert actions apply on the global level only. Multiple alert actions for the same action type are not supported. For example, you cannot have multiple email actions and multiple PagerDuty actions.
When you set up health report alert actions, you must specify the alert actions in the [health_reporter]
stanza in SPLUNK_HOME/etc/system/local/health.conf
. You can specify alert actions in a comma separated list. For example:
[health_reporter] alert_actions=email, mobile, VictorOps, PagerDuty
You must also set up the specific alert action using the appropriate alert action stanza, as shown in the following sections.
Set up email alert notifications in health.conf
Before you can send health email alert notifications, you must configure email notification settings in Splunk Web. For instructions on how to configure email notifications, see Email notification action in the Alerting Manual.
To set up email alert notifications:
- Edit
SPLUNK_HOME/etc/system/local/health.conf
- Add the
[alert_action:email]
stanza and specify the recipients. For example:[alert_action:email] disabled = 0 action.to = <recipient@example.com> action.cc = <recipient_2@example.com> action.bcc = <other_recipients@example.com>
Set up mobile alert notifications in health.conf
Before you configure mobile alert notifications for health report, you must download the Splunk Mobile iOS app from the App Store to your phone, and download the Splunk Cloud Gateway app from Splunkbase to your Splunk instance. For details on how to install and set up the Splunk Cloud Gateway app, see Install Splunk Cloud Gateway.
To set up mobile alert notifications:
- In Splunk Web, click Settings > Alert actions. In the Send to mobile row, confirm that
splunk_app_cloudgateway
is enabled. - In Splunk Web, click Apps > Splunk Cloud Gateway > Register. Enter the activation code displayed in the Splunk Mobile app on your mobile device.
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
[alert_action:mobile]
stanza and specify the recipients. For example:[alert_action:mobile] disabled = 0 action.alert_recipients = admin
Set up VictorOps alert notifications in health.conf
Before you can send alert notifications to VictorOps, you must install the VictorOps App from Splunkbase. You must also create an API key and optionally a routing key in VictorOps. For instructions on how to create an API key for VictorOPs, see the Splunk Integration Guide for VictorOps.
To set up VictorOps alert notifications:
- In Splunk Web, click Settings > Alert actions. In the VictorOps row, confirm the
victorops_app
is enabled and click Setup VictorOps Notifications. - Enter the API key and optional Routing Key.
- Click Save.
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
[alert_action:victorops]
stanza and specify the alert message type. For example:[alert_action:victorops] disabled = 0 action.message_type = CRITICAL
For more information on VictorOps alert settings, including valid alert message types and optional settings, see health.conf.example in the Admin Manual.
Set up PagerDuty alert notifications in health.conf
Before you can send alert notifications to PagerDuty, you must install the PagerDuty App from Splunkbase. You must also add a new service to your PagerDuty integration, and copy the integration key. For more information, see the PagerDuty documentation.
To set up PagerDuty alert notifications:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
[alert_action:pagerduty]
stanza and specify the integration key. For example:[alert_action:pagerduty] disabled = 0 action.integration_url_override = <integration key>
For more information, see health.conf.example.
Set up alert actions using REST
To set up alert actions, send a POST request to server/health-config/{alert_action}
. For example, to set up an email alert notification:
curl -k -u admin:pass https://localhost:8089/services/server/health-config/alert_action:email -d action.to=admin@example.com -d action.cc=admin2@example.com
For endpoint details, see server/health-config/{alert_action} in the REST API Reference Manual.
Set up distributed health report alerts
If you are monitoring a distributed deployment using the distributed health report, you can set up alert actions directly on the distributed health report's central instance. For more information, see Set up distributed health report alert actions.
Set the alert threshold color
You can set the threshold color that triggers an alert. Possible alert threshold values are yellow or red. If the threshold value is yellow, an alert fires for both yellow and red. If the value is red, an alert fires for red only. The default alert threshold value is red.
Set the alert threshold color in health.conf
To set the alert threshold color on the global or feature level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
alert.threshold_color
setting to the[health_reporter]
or[feature:<feature_name>]
stanza. For example:[feature:replication_failures] ... alert.threshold_color = yellow indicator:replication_failures:red = 10 indicator:replication_failures:yellow = 5
To set the alert threshold color on the indicator level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf.
- Add the
alert:<indicator name>.threshold_color
setting to the feature stanza. For example:[feature:replication_failures] ... indicator:replication_failures:red = 10 indicator:replication_failures:yellow = 5 alert:replication_failures.threshold_color = yellow
Alert threshold color settings at the indicator level override alert threshold color settings at the feature level.
Set the alert threshold color using REST
To set the alert threshold color for a feature or indicator, send a POST request to server/health-config{feature_name}
. For example, to set the alert threshold color for the Replication Failures feature:
curl -k -u admin:pass https://localhost:8089/services/server/health-config/feature:replication_failures -d alert.threshold_color=yellow
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
Set minimum duration between alerts
You can set the amount of time an unhealthy health status persists before an alert fires using the alert.min_duration_sec
setting. You can use this setting to help reduce noise from feature health status changes that might be rapidly flipping between states, for example, between green and yellow or yellow and red.
Set minimum duration between alerts in health.conf
To set the minimum duration between alerts on the global or feature level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - Add the
alert.min_duration_sec
setting to the[health_reporter]
or[feature:<feature_name>]
stanza. For example:[feature:replication_failures] ... alert.min_duration_sec = 600 indicator:replication_failures:red = 10 indicator:replication_failures:yellow = 5
To set the minimum duration between alerts on the indicator level:
- Edit
$SPLUNK_HOME/etc/system/local/health.conf.
- Add the
alert:<indicator name>.min_duration_sec
setting to the[feature:<feature_name>]
stanza. For example:[feature:replication_failures] ... indicator:replication_failures:red = 10 indicator:replication_failures:yellow = 5 alert:replication_factor.min_duration_sec = 600
Minimum duration between alerts settings on the feature level override settings on the indicator level.
Set minimum duration between alerts using REST
To set the minimum duration between alerts for a feature or indicator, send a POST request to server/health-config/{feature_name}
. For example, to set the minimum duration between alerts for the Replication Failures feature:
curl -k -u admin:pass https://localhost:8089/services/server/health-config/feature:replication_failures -d alert.min_duration_sec=600
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
This documentation applies to the following versions of Splunk® Enterprise: 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12
Feedback submitted, thanks!