Use the monitoring console to view search head cluster status and troubleshoot issues

You can use the monitoring console to monitor most aspects of your deployment. This topic discusses the console dashboards that provide insight into search head clusters.

The primary documentation for the monitoring console is located in Monitoring Splunk Enterprise.

Search head clustering dashboards in the monitoring console

There are several search head clustering dashboards under the Search menu:

Search Head Clustering: Status and Configuration
Search Head Clustering: Configuration Replication
Search Head Clustering: Artifact Replication
Search Head Clustering: Scheduler Delegation
Search Head Clustering: App Deployment

These dashboards provide a wealth of information about your search head cluster, such as:

Cluster member instance names and status
Identification of current captain and captain election activity
Configuration replication performance
Artifact replication details
Scheduler activity
Deployer activity

View the dashboards themselves for more information. In addition, see Search head clustering dashboards in Monitoring Splunk Enterprise.

Note: You can also use the CLI to get basic information about the cluster. See Use the CLI to view information about a search head cluster.

Troubleshoot the search head cluster

As part of its continuous monitoring of the search head cluster, the monitoring console provides a variety of information useful for troubleshooting. For example:

The Search Head Clustering: Status and Configuration dashboard shows:
- Search concurrency for various types of searches, with details on running versus limit
- Status, including captaincy and state
- Heartbeat information (discussed elsewhere in this topic)
- Configuration baseline consistency (discussed elsewhere in this topic)
- Artifact count
- Election activity
The Search Head Clustering: Configuration Replication dashboard shows:
- Warning and error patterns
- Configuration replication activity
The Search Head Clustering: Artifact Replication dashboard shows:
- Warning and error patterns
- Artifact replication activity
The Search Head Clustering: Scheduler Delegation dashboard shows:
- Scheduler delegation activity
The Search Head Clustering: App Deployment dashboard shows:
- Status of app deployments

Troubleshoot heartbeat issues

The Search Head Clustering: Status and Configuration dashboard provides insight into the heartbeats that the cluster members send to the captain. Specifically, it shows, for each member:

The time that the member last sent a heartbeat to the captain
The time that the captain last received a heartbeat from the member

These times should be the same or nearly the same. Significant differences in the sent and received times indicate likely problems.

You can also access heartbeat information through the REST API. See the REST API documentation for shcluster/captain/members/{name}.

The role of the heartbeat

Members send a heartbeat to the captain on a regular basis. By default, the member sends a heartbeat every five seconds.

The frequency is defined by the heartbeat_period attribute in the [shclustering] stanza of server.conf on each member. All members must set this attribute to the same value.

The heartbeat is the fundamental communication from the member to the captain. It indicates that the member is alive and part of the cluster. The heartbeat also contains a variety of information, such as:

Search artifacts
Dispatched searches
Alerts and suppressions
Completed summarization jobs
Member load information

When the captain receives the heartbeat, it notes that the member is in the "up" state.

After the captain receives a heartbeat from every node, it consolidates all the transmitted information and, in turn, sends members information such as:

Search artifact logs
List of overall alerts and suppressions
Dispatched searches

Impact of heartbeat failure

The captain expects to get a heartbeat from each member on a regular basis, as specified in the heartbeat_timeout attribute in the [shclustering] stanza of server.conf.

By default, the timeout is set to 60 seconds.

The captain only knows about the existence of a member through its heartbeat. If it never receives a heartbeat, it will not know that the member exists.

If, within the specified timeout period, the captain does not get a heartbeat from a member that has previously sent a heartbeat, the captain marks the member as "down". The captain does not dispatch new searches to members in the "down" state.

Causes of heartbeat failure

If the captain does not receive a heartbeat from a member, it usually indicates one of the following situations:

Member is down or unavailable.
Network partition between captain and member.
HTTP request failures. These are visible in splunkd_access.log on the captain.

Note: By default, Splunk Enterprise logs only heartbeat failures in splunkd_access.log. To enable logging for heartbeat successes as well, configure access_logging_for_heartbeats=true in the [shclustering] stanza of server.conf on the captain. If you want this configuration change to persist across captaincy transfer, make the change on all members, not just the current captain.

Troubleshoot configuration baseline consistency

The Search Head Clustering: Status and Configuration dashboard includes information on the consistency of the configuration baseline. This information helps to determine whether configuration changes are being properly replicated across the set of cluster members.

To find this information, go to the Snapshots section of the dashboard and view the Status table. There is one row for each member. The table includes two columns that pertain to baseline consistency:

Configuration Baseline Consistency. This column contains a ratio that compares the consistency of each member's baseline to the baselines for all other members. For more details, click the ratio. A table to the right then compares the member's baseline consistency against each individual member.

Number of Unpublished Changes. This column indicates whether there are any sets of configuration changes on the member that have not yet been replicated to the captain. In particular, it notes whether a member is out-of-sync with the captain.

When a baseline mismatch is detected, at least one member requires manual intervention to regain baseline consistency. Examine the consistency comparison table to identify the member that is not in sync with a majority of the other members. To restore consistency, perform a manual resync on the member, using the splunk resync shcluster-replicated-config command. See Perform a manual resync.

For a discussion of configuration replication, see Configuration updates that the cluster replicates.

Related answers from Splunk Community

Use the monitoring console to view search head cluster status and troubleshoot issues

Search head clustering dashboards in the monitoring console

Troubleshoot the search head cluster

Troubleshoot heartbeat issues

The role of the heartbeat

Impact of heartbeat failure

Causes of heartbeat failure

Troubleshoot configuration baseline consistency

Comments

Use the monitoring console to view search head cluster status and troubleshoot issues

Was this topic useful?