Configure proactive Splunk component monitoring
Proactive Splunk component monitoring reports on a pre-defined set of Splunk Enterprise features. You can modify the configuration of some feature attributes, including feature health status thresholds, in both Splunk Web and $SPLUNK_HOME/etc/system/local/health.conf
.
For more information on health.conf
configuration settings, see health.conf.spec in the Admin Manual.
Set access controls
To access proactive Splunk component monitoring, a user's role must have the list_health
capability. The list_health
capability is enabled for role_admin
by default. You can add this capability to any role in Splunk Web or in authorize.conf
.
To add a capability to a role in Splunk Web, see Add and edit roles with Splunk Web.
To add a capability to a role in authorize.conf
, see Add and edit roles with authorize.conf.
Set feature thresholds
Each feature in the health status tree has one or more indicators. Each indicator reports a value against a pre-set threshold, which determines the status of the feature. When the indicator value meets the threshold condition, the health status of the feature changes, for example, from green to yellow, or yellow to red.
There are two valid thresholds for each indicator: yellow
and red
. You can modify the value of these thresholds for any feature in health.conf.
To set indicator threshold values:
- Log in to the instance you are monitoring.
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
- In the feature stanza, set new indicator threshold values. For example, to modify indicator threshold values for the
batchreader
feature, set new values fordata_out_rate:yellow
anddata_out_rate:red
thresholds in the following stanza:[feature:batchreader] indicator:data_out_rate:red = 10 indicator:data_out_rate:yellow = 5
Indicator thresholds are pre-set to values that apply to most use cases. When you modify threshold values, make changes in small increments. Setting threshold values too high can mask serious problems or failures.
Disable a feature
You can disable any feature in health.conf
. Disabling a feature removes that feature from the splunkd
health status tree. This is useful, for example, if you want to exclude a feature's status from the health report, while you troubleshoot a problem with that feature. All supported features are enabled by default in health.conf
.
There are two ways to disable a feature:
- Edit the feature stanza in
health.conf
. - Use the
/server/health-config
endpoint.
Disable a feature in health.conf
- Log in to the instance you are monitoring.
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
. - In the feature stanza, add
disabled = 1
. For example, to disable the Data Durability feature:[feature:data_durability] indicator:cluster_replication_factor:red = 1 indicator:cluster_search_factor:red = 1 disabled = 1
The feature is disabled. The feature's status no longer impacts the overall status of
splunkd
.To enable a feature, set
disabled = 0
- Reload
health.conf
:curl -k -u admin:pass https://<host>:<mPort>/services/configs/conf-health.conf/_reload
Disable a feature using REST endpoint
- Log in to the instance your are monitoring.
- Run the following command against the
server/health-config/{feature_name}
endpoint. For example, to disable thebatchreader
feature:curl -k -u admin:pass \ https://<host>:<mPort>/services/server/health-config/batchreader -d disabled=1
- Validate the feature no longer appears in the
splunkd
status report in Splunk Web.
To access the server/health-config/{feature_name}
endpoint, a role must have the edit_health
capability.
For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.
Suppress health status updates
Features in the health status tree update their status at predetermined intervals. A feature whose health status changes frequently can cause excessive undesirable changes to the overall status of the splunkd
health report. To prevent this, use the suppress_status_update_ms
attribute in health.conf
to reduce the frequency with which a particular feature can update its health status.
Use the suppress_health_status_update_ms
attribute to:
- Limit excessive changes to the internal state by individual features.
- Reduce the number of log entries that arise from rapid feature status changes.
- Help quiet "noisy" features.
For example, an indexer clustering feature, such as data_durability
, can experience frequent status changes during operations that impact its indicators: cluster_replication_factor
and cluster_search_factor
. To avoid frequent changes to the overall splunkd
health report, you might set suppress_status_update_ms = 60000
to reduce health status updates to once every minute.
To suppress health status updates:
- Log in to the instance you are monitoring.
- Edit
$SPLUNK_HOME/etc/system/local/health.conf
- In the appropriate feature stanza, add the
suppress_status_update_ms
attribute. For example:[feature:data_durability] indicator:cluster_replication_factor:red = 1 indicator:cluster_search_factor:red = 1 suppress_status_update_ms = 60000
By default, the minimum amount of time that must elapse between status updates is 300ms.
For more information, see health.conf.spec in the Admin Manual.
Configure health status logs
Each feature in the splunkd
health status tree generates log entries in health.log
. These log entries record information about feature indicator status changes over time. health.log
is located in SPLUNK_HOME/var/log/splunk/
.
There are two types of health.log
log entries:
HealthChangeReporter
: This log entry records specific health status changes for a feature indicator. Each entry includes a timestamp, feature name, indicator name, previous color, new color, and a possible reason for the status change. This log entry appears only if a feature's status changes, for example, from green to red:
02-28-2018 20:26:52.775 +0000 INFO HealthChangeReporter - feature="Data Durability" indicator="cluster_replication_factor" previous_color=green color=red reason="Replication Factor is not met"
PeriodicHealthReporter
: This log entry keeps an ongoing record of the status of each feature in the health status tree. Each entry includes a timestamp, the feature name, and current color. Log entries are made at a user-configurable interval. For example:
02-28-2018 20:27:06.826 +0000 INFO PeriodicHealthReporter - feature="Data Durability" color=red
Set health.log entry intervals
You can set the interval at which PeriodicHealthReporter
log entries are added to health.log
. This is useful if you want to increase or decrease the overall number of log entries that appear in health.log.
To adjust the frequency of PeriodicHealthReporter
log entries in health.log
:
- Log in to the instance you are monitoring.
- Edit $SPLUNK_HOME/etc/system/local/health.conf
- In the [health_reporter] stanza, set the
full_health_log_interval
attribute to an appropriate value in seconds. For example:[health_reporter] full_health_log_interval = 60
By default, each feature generates a
PeriodicHealthReporter
log entry every 30 seconds.
Requirements | Use proactive Splunk component monitoring |
This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10
Feedback submitted, thanks!