Splunk® Enterprise

Monitoring Splunk Enterprise

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Configure proactive Splunk component monitoring

Proactive Splunk component monitoring reports on a pre-defined set of Splunk Enterprise features. You can modify the configuration of some feature attributes, including feature health status thresholds, in both Splunk Web and $SPLUNK_HOME/etc/system/local/health.conf.

For more information on health.conf configuration settings, see health.conf.spec in the Admin Manual.

Set access controls

To access proactive Splunk component monitoring, a user's role must have the list_health capability. The list_health capability is enabled for role_admin by default. You can add this capability to any role in Splunk Web or in authorize.conf.

To add a capability to a role in Splunk Web, see Add and edit roles with Splunk Web.

To add a capability to a role in authorize.conf, see Add and edit roles with authorize.conf.

Set feature thresholds

Each feature in the health status tree has one or more indicators. Each indicator reports a value against a pre-set threshold, which determines the status of the feature. When the indicator value meets the threshold condition, the health status of the feature changes, for example, from green to yellow, or yellow to red.

There are two valid thresholds for each indicator: yellow and red. You can modify the value of these thresholds for any feature in health.conf.

To set indicator threshold values:

  1. Log in to the instance you are monitoring.
  2. Edit $SPLUNK_HOME/etc/system/local/health.conf
  3. In the feature stanza, set new indicator threshold values. For example, to modify indicator threshold values for the batchreader feature, set new values for data_out_rate:yellow and data_out_rate:red thresholds in the following stanza:
    [feature:batchreader]
    indicator:data_out_rate:red = 10
    indicator:data_out_rate:yellow = 5
    

    Indicator thresholds are pre-set to values that apply to most use cases. When you modify threshold values, make changes in small increments. Setting threshold values too high can mask serious problems or failures.

Disable a feature

You can disable any feature in health.conf. Disabling a feature removes that feature from the splunkd health status tree. This is useful, for example, if you want to exclude a feature's status from the health report, while you troubleshoot a problem with that feature. All supported features are enabled by default in health.conf.

There are two ways to disable a feature:

  • Edit the feature stanza in health.conf.
  • Use the /server/health-config endpoint.

Disable a feature in health.conf

  1. Log in to the instance you are monitoring.
  2. Edit $SPLUNK_HOME/etc/system/local/health.conf.
  3. In the feature stanza, add disabled = 1. For example, to disable the Data Durability feature:
    [feature:data_durability]
    indicator:cluster_replication_factor:red = 1
    indicator:cluster_search_factor:red = 1
    disabled = 1
    

    The feature is disabled. The feature's status no longer impacts the overall status of splunkd.

    To enable a feature, set disabled = 0

  4. Reload health.conf:
    curl -k -u admin:pass https://<host>:<mPort>/services/configs/conf-health.conf/_reload
    

Disable a feature using REST endpoint

  1. Log in to the instance your are monitoring.
  2. Run the following command against the server/health-config/{feature_name} endpoint. For example, to disable the batchreader feature:
    curl -k -u admin:pass \ 
    https://<host>:<mPort>/services/server/health-config/batchreader -d disabled=1
    
  3. Validate the feature no longer appears in the splunkd status report in Splunk Web.

To access the server/health-config/{feature_name} endpoint, a role must have the edit_health capability.

For endpoint details, see server/health-config/{feature_name} in the REST API Reference Manual.

Suppress health status updates

Features in the health status tree update their status at predetermined intervals. A feature whose health status changes frequently can cause excessive undesirable changes to the overall status of the splunkd health report. To prevent this, use the suppress_status_update_ms attribute in health.conf to reduce the frequency with which a particular feature can update its health status.

Use the suppress_health_status_update_ms attribute to:

  • Limit excessive changes to the internal state by individual features.
  • Reduce the number of log entries that arise from rapid feature status changes.
  • Help quiet "noisy" features.

For example, an indexer clustering feature, such as data_durability, can experience frequent status changes during operations that impact its indicators: cluster_replication_factor and cluster_search_factor. To avoid frequent changes to the overall splunkd health report, you might set suppress_status_update_ms = 60000 to reduce health status updates to once every minute.

To suppress health status updates:

  1. Log in to the instance you are monitoring.
  2. Edit $SPLUNK_HOME/etc/system/local/health.conf
  3. In the appropriate feature stanza, add the suppress_status_update_ms attribute. For example:
    [feature:data_durability]
    indicator:cluster_replication_factor:red = 1
    indicator:cluster_search_factor:red = 1
    suppress_status_update_ms = 60000
    

    By default, the minimum amount of time that must elapse between status updates is 300ms.

For more information, see health.conf.spec in the Admin Manual.

Configure health status logs

Each feature in the splunkd health status tree generates log entries in health.log. These log entries record information about feature indicator status changes over time. health.log is located in SPLUNK_HOME/var/log/splunk/.

There are two types of health.log log entries:

HealthChangeReporter: This log entry records specific health status changes for a feature indicator. Each entry includes a timestamp, feature name, indicator name, previous color, new color, and a possible reason for the status change. This log entry appears only if a feature's status changes, for example, from green to red:

02-28-2018 20:26:52.775 +0000 INFO  HealthChangeReporter - feature="Data Durability" indicator="cluster_replication_factor" previous_color=green color=red reason="Replication Factor is not met"

PeriodicHealthReporter: This log entry keeps an ongoing record of the status of each feature in the health status tree. Each entry includes a timestamp, the feature name, and current color. Log entries are made at a user-configurable interval. For example:

02-28-2018 20:27:06.826 +0000 INFO  PeriodicHealthReporter - feature="Data Durability" color=red

Set health.log entry intervals

You can set the interval at which PeriodicHealthReporter log entries are added to health.log. This is useful if you want to increase or decrease the overall number of log entries that appear in health.log.

To adjust the frequency of PeriodicHealthReporter log entries in health.log:

  1. Log in to the instance you are monitoring.
  2. Edit $SPLUNK_HOME/etc/system/local/health.conf
  3. In the [health_reporter] stanza, set the full_health_log_interval attribute to an appropriate value in seconds. For example:
    [health_reporter]
    full_health_log_interval = 60
    

    By default, each feature generates a PeriodicHealthReporter log entry every 30 seconds.

PREVIOUS
Requirements
  NEXT
Use proactive Splunk component monitoring

This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters