Splunk® Enterprise

Distributed Search

Use the monitoring console to view search head cluster status and troubleshoot issues

You can use the monitoring console to monitor most aspects of your deployment. This topic discusses the console dashboards that provide insight into search head clusters.

The primary documentation for the monitoring console is located in Monitoring Splunk Enterprise.

Search head clustering dashboards in the monitoring console

There are several search head clustering dashboards under the Search menu:

  • Search Head Clustering: Status and Configuration
  • Search Head Clustering: Configuration Replication
  • Search Head Clustering: Artifact Replication
  • Search Head Clustering: Scheduler Delegation
  • Search Head Clustering: App Deployment

These dashboards provide a wealth of information about your search head cluster, such as:

  • Cluster member instance names and status
  • Identification of current captain and captain election activity
  • Configuration replication performance
  • Artifact replication details
  • Scheduler activity
  • Deployer activity

View the dashboards themselves for more information. In addition, see Search head clustering dashboards in Monitoring Splunk Enterprise.

Note: You can also use the CLI to get basic information about the cluster. See Use the CLI to view information about a search head cluster.

Troubleshoot the search head cluster

As part of its continuous monitoring of the search head cluster, the monitoring console provides a variety of information useful for troubleshooting. For example:

  • The Search Head Clustering: Status and Configuration dashboard shows:
    • Search concurrency for various types of searches, with details on running versus limit
    • Status, including captaincy and state
    • Heartbeat information (discussed elsewhere in this topic)
    • Configuration baseline consistency (discussed elsewhere in this topic)
    • Artifact count
    • Election activity
  • The Search Head Clustering: Configuration Replication dashboard shows:
    • Warning and error patterns
    • Configuration replication activity
  • The Search Head Clustering: Artifact Replication dashboard shows:
    • Warning and error patterns
    • Artifact replication activity
  • The Search Head Clustering: Scheduler Delegation dashboard shows:
    • Scheduler delegation activity
  • The Search Head Clustering: App Deployment dashboard shows:
    • Status of app deployments

Troubleshoot heartbeat issues

The Search Head Clustering: Status and Configuration dashboard provides insight into the heartbeats that the cluster members send to the captain. Specifically, it shows, for each member:

  • The time that the member last sent a heartbeat to the captain
  • The time that the captain last received a heartbeat from the member

These times should be the same or nearly the same. Significant differences in the sent and received times indicate likely problems.

You can also access heartbeat information through the REST API. See the REST API documentation for shcluster/captain/members/{name}.

The role of the heartbeat

Members send a heartbeat to the captain on a regular basis. By default, the member sends a heartbeat every five seconds.

The frequency is defined by the heartbeat_period attribute in the [shclustering] stanza of server.conf on each member. All members must set this attribute to the same value.

The heartbeat is the fundamental communication from the member to the captain. It indicates that the member is alive and part of the cluster. The heartbeat also contains a variety of information, such as:

  • Search artifacts
  • Dispatched searches
  • Alerts and suppressions
  • Completed summarization jobs
  • Member load information

When the captain receives the heartbeat, it notes that the member is in the "up" state.

After the captain receives a heartbeat from every node, it consolidates all the transmitted information and, in turn, sends members information such as:

  • Search artifact logs
  • List of overall alerts and suppressions
  • Dispatched searches

Impact of heartbeat failure

The captain expects to get a heartbeat from each member on a regular basis, as specified in the heartbeat_timeout attribute in the [shclustering] stanza of server.conf.

By default, the timeout is set to 60 seconds.

The captain only knows about the existence of a member through its heartbeat. If it never receives a heartbeat, it will not know that the member exists.

If, within the specified timeout period, the captain does not get a heartbeat from a member that has previously sent a heartbeat, the captain marks the member as "down". The captain does not dispatch new searches to members in the "down" state.

Causes of heartbeat failure

If the captain does not receive a heartbeat from a member, it usually indicates one of the following situations:

  • Member is down or unavailable.
  • Network partition between captain and member.
  • HTTP request failures. These are visible in splunkd_access.log on the captain.

Note: By default, Splunk Enterprise logs only heartbeat failures in splunkd_access.log. To enable logging for heartbeat successes as well, configure access_logging_for_heartbeats=true in the [shclustering] stanza of server.conf on the captain. If you want this configuration change to persist across captaincy transfer, make the change on all members, not just the current captain.

Troubleshoot configuration baseline consistency

The Search Head Clustering: Status and Configuration dashboard includes information on the consistency of the configuration baseline. This information helps to determine whether configuration changes are being properly replicated across the set of cluster members.

To find this information, go to the Snapshots section of the dashboard and view the Status table. There is one row for each member. The table includes two columns that pertain to baseline consistency:

  • Configuration Baseline Consistency. This column contains a ratio that compares the consistency of each member's baseline to the baselines for all other members. For more details, click the ratio. A table to the right then compares the member's baseline consistency against each individual member.
  • Number of Unpublished Changes. This column indicates whether there are any sets of configuration changes on the member that have not yet been replicated to the captain. In particular, it notes whether a member is out-of-sync with the captain.

When a baseline mismatch is detected, at least one member requires manual intervention to regain baseline consistency. Examine the consistency comparison table to identify the member that is not in sync with a majority of the other members. To restore consistency, perform a manual resync on the member, using the splunk resync shcluster-replicated-config command. See Perform a manual resync.

For a discussion of configuration replication, see Configuration updates that the cluster replicates.

Last modified on 15 December, 2016
Use the CLI to view information about a search head cluster   Deployment issues

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.2.0, 9.2.1

Was this topic useful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters