Search head clustering dashboards
Status and Configuration
The Status and Configuration dashboard is an overview of your search head cluster. It is high-level information.
The Configuration Replication dashboard provides insight into configurations that a user changes on any search head cluster member (for example a new event type), and how these changes propagate through the cluster. Use this dashboard if you notice a significant lag in this propagation.
Action reference: The following are low-level actions exposed in the Count of Actions Over Time and Time Spent on Actions Over Time panels. These panels can be helpful for troubleshooting.
|accept_push||On the captain, accept replicated changes from a member.|
|acquire_mutex||Acquire a mutex (mutual exclusion) that "protects" the configuration system.|
|add_commit||On a member, record a change.|
|base_initialize||Initialize a configuration "root" (e.g. $SPLUNK_HOME/etc).|
|check_range||Compare two ranges of configuration changes.|
|compute_common||Find the latest common change between a member and the captain.|
|pull_from||On a member, pull changes from the captain.|
|purge_eligible||On a member, purge sufficiently old changes from the repo.|
|push_to||On a member, push changes to the captain.|
|release_and_reacquire_mutex||Release, then re-acquire a mutex that "protects" the configuration system. This is similar to acquire_mutex.|
|reply_pull||On the captain, reply to a member's pull_from request.|
|repo_initialize||Initialize a configuration repo (from disk).|
We expect this information to be leveraged by Splunk Support. If you have issues with configuration replication, you can look at this dashboard for clues. But we expect you to use this dashboard more for gathering information after you file your Support case, rather than gaining insight on your own.
The Artifact Replication dashboard contains several panels describing the cluster's backlog of search artifacts to replicate. See Search head clustering architecture in the 'Distributed Search' Manual.
The Warnings and Errors Patterns panel groups warning and error events based on text within the messages. The grouping functionality uses the cluster command.
If your search head cluster is replicating artifacts on time, its Count of Artifacts Awaiting Replication will be at or near zero. A few artifacts awaiting replication is likely not a warning sign. A consistently high and especially a growing number of artifacts could indicate a problem with replication. If you have many artifacts waiting, someone using another search head might not get a local cache and will experience slowness in search availability.
Median Count of Artifacts to Replicate is (as advertised) a median. This means that if you have narrow spikes, you won't see them at larger time ranges.
The Artifact Replication Job Activity panel shows the rate of change of replicating jobs (specifically, the backlog change is the rate). The backlog change can be negative, if your cluster is catching up with its backlog. In this panel, a red flag to look for is a backlog that grows consistently (that is, if the backlog change is always positive). If this happens, the Median Count of Artifacts to Replicate panel above shows a continually growing backlog.
See "Search head clustering architecture" in the Distributed Search Manual.
In the Scheduler Status panel, note that max_pending and max_running are "highwater marks" over a 30 second period. That is, they are the highest number of jobs that were pending or running in a 30 second span. You can select one of several functions in this panel. The "maximum" function works in a straightforward manner with these statistics. But take a moment to think through what "average," "median," or "90th percentile" mean. For example: Say max_pending is 4 over 30 seconds, then you average the values of max_pending. You end up with the average high values, not the average of all. So if the number of pending jobs fluctuates a lot, the average max_pending might not be close to a straight average of the number of pending jobs.
The App Deployment dashboard monitors apps as they are deployed from a deployer to search head cluster members.
See About deployment server and forwarder management in the Updating Splunk Enterprise Instances Manual.
In the Apps status panel, a persistent discrepancy indicates that the deployer has not finished deploying apps to its members.
Troubleshoot these views
The search head clustering dashboards require the monitored instances to be running Splunk Enterprise 6.2.0 or greater.
Make sure you have completed all of the DMC setup steps.
- Forward logs from search heads and deployers to indexers. See "DMC prerequisites."
- For all search head clustering dashboards, search heads need to be set as search peers of the DMC.
- All search head clustering dashboards need members of a search head cluster. See "Set cluster labels." Note that the app deployer also needs a label.
For the App Deployment dashboard:
- The deployer needs to be a search peer of the DMC, or the DMC can be hosted on the deployer. See "Add instances as search peers."
- The deployer needs to have the deployer role (it might auto-detect). Check this in Distributed Management Console > Settings > General Setup.
- The deployer needs to be manually labeled as a member of the SHC. (It will not auto-detect.) Set this in Distributed Management Console > Settings > General Setup.
- The deployer must forward logs, as above. See "DMC prerequisites."
KV store: Deployment
Indexer clustering: Status
This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11