Search: Search Head Clustering

This topic is a reference for all of the Monitoring Console dashboards related to search head clustering. See About the Monitoring Console.

Status and Configuration

The Status and Configuration dashboard is an overview of your search head cluster. It is high-level information.

Configuration Replication

The Configuration Replication dashboard provides insight into configurations that a user changes on any search head cluster member (for example a new event type), and how these changes propagate through the cluster. Use this dashboard if you notice a significant lag in this propagation.

Action reference: The following are low-level actions exposed in the Count of Actions Over Time and Time Spent on Actions Over Time panels. These panels can be helpful for troubleshooting.

Action	Description
accept_push	On the captain, accept replicated changes from a member.
acquire_mutex	Acquire a mutex (mutual exclusion) that "protects" the configuration system.
add_commit	On a member, record a change.
base_initialize	Initialize a configuration "root" (e.g. $SPLUNK_HOME/etc).
check_range	Compare two ranges of configuration changes.
compute_common	Find the latest common change between a member and the captain.
pull_from	On a member, pull changes from the captain.
purge_eligible	On a member, purge sufficiently old changes from the repo.
push_to	On a member, push changes to the captain.
release_and_reacquire_mutex	Release, then re-acquire a mutex that "protects" the configuration system. This is similar to acquire_mutex.
reply_pull	On the captain, reply to a member's pull_from request.
repo_initialize	Initialize a configuration repo (from disk).

We expect this information to be leveraged by Splunk Support. If you have issues with configuration replication, you can look at this dashboard for clues. But we expect you to use this dashboard more for gathering information after you file your Support case, rather than gaining insight on your own.

Artifact Replication

The Artifact Replication dashboard contains several panels describing the cluster's backlog of search artifacts to replicate. See Search head clustering architecture in the 'Distributed Search' Manual.

The Warnings and Errors Patterns panel groups warning and error events based on text within the messages. The grouping functionality uses the cluster command.

If your search head cluster is replicating artifacts on time, its Count of Artifacts Awaiting Replication will be at or near zero. A few artifacts awaiting replication is likely not a warning sign. A consistently high and especially a growing number of artifacts could indicate a problem with replication. If you have many artifacts waiting, someone using another search head might not get a local cache and will experience slowness in search availability.

Median Count of Artifacts to Replicate is (as advertised) a median. This means that if you have narrow spikes, you won't see them at larger time ranges.

The Artifact Replication Job Activity panel shows the rate of change of replicating jobs (specifically, the backlog change is the rate). The backlog change can be negative, if your cluster is catching up with its backlog. In this panel, a red flag to look for is a backlog that grows consistently (that is, if the backlog change is always positive). If this happens, the Median Count of Artifacts to Replicate panel above shows a continually growing backlog.

Scheduler Delegation

See Search head clustering architecture in the Distributed Search Manual.

In the Scheduler Status panel, note that max_pending and max_running are "highwater marks" over a 30 second period. That is, they are the highest number of jobs that were pending or running in a 30 second span. You can select one of several functions in this panel. The "maximum" function works in a straightforward manner with these statistics. But take a moment to think through what "average," "median," or "90th percentile" mean. For example: Say max_pending is 4 over 30 seconds, then you average the values of max_pending. You end up with the average high values, not the average of all. So if the number of pending jobs fluctuates a lot, the average max_pending might not be close to a straight average of the number of pending jobs.

App Deployment

The App Deployment dashboard monitors apps as they are deployed from a deployer to search head cluster members.

See About deployment server and forwarder management in the Updating Splunk Enterprise Instances Manual.

In the Apps status panel, a persistent discrepancy indicates that the deployer has not finished deploying apps to its members.

Troubleshoot these dashboards

The search head clustering dashboards require the monitored instances to be running Splunk Enterprise 6.2.0 or greater.

Make sure you have completed all of the Monitoring Console setup steps.

In particular:

Forward logs from search heads and deployers to indexers. See Monitoring Console prerequisites.
For all search head clustering dashboards, search heads need to be set as search peers of the Monitoring Console.
All search head clustering dashboards need members of a search head cluster. See Set cluster labels. Note that the app deployer also needs a label.

For the App Deployment dashboard:

The deployer needs to be a search peer of the Monitoring Console, or the Monitoring Console can be hosted on the deployer. See Add instances as search peers.
The deployer needs to have the deployer role (it might auto-detect). Check this in Monitoring Console > Settings > General Setup.
The deployer needs to be manually labeled as a member of the SHC. (It will not auto-detect.) Set this in Monitoring Console > Settings > General Setup.
The deployer must forward logs, as above. See Monitoring Console prerequisites.

Related answers from Splunk Community

Search: Search Head Clustering

Status and Configuration

Configuration Replication

Artifact Replication

Scheduler Delegation

App Deployment

Troubleshoot these dashboards

Comments

Search: Search Head Clustering

Was this topic useful?