Handle Raft issues
If the Raft metadata that underlies search head clustering gets into a bad state on a member, you can often correct the problem by cleaning the member's var/run/splunk/_raft
folder. See Fix Raft issues on a member.
If the cluster is unable to elect a captain and maintain a healthy state due to Raft issues, you can clean the Raft folder on all members and then bootstrap the cluster. See Fix the entire cluster.
Fix Raft issues on a member
The primary symptom of a Raft issue is that the member's status appears as "down" when you run splunk show shcluster-status
on the captain. To confirm the Raft issue, look in the member's splunkd.log
file for an error message that starts with the string "ERROR SHCRaftConsensus".
File corruption in a member's _raft
folder is a common cause of Raft issues. You can fix the problem by cleaning the folder on the member. The folder then repopulates from the captain.
To fix a Raft issue, clean the member's _raft
folder. Run the splunk clean raft
command on the member:
-
Stop the member:
splunk stop
-
Clean the member's raft folder:
splunk clean raft
-
Start the member:
splunk start
The _raft
folder will be repopulated from the captain.
Fix the entire cluster
If captain election fails even though a majority of members are available, raft metadata corruption is a likely cause. To confirm, you can examine the members' splunkd.log
files for errors that start with the string "ERROR SHCRaftConsensus".
You can resolve the issue by cleaning the folder on all members and then bootstrapping the cluster:
- Stop all members.
-
Run
splunk clean raft
on each member:splunk clean raft
- Start all members.
-
Select one member to be captain and bootstrap it:
splunk bootstrap shcluster-captain -servers_list "<URI>:<management_port>,<URI>:<management_port>,..." -auth <username>:<password>
The -servers_list parameter contains a comma-separated list of the cluster members, including the member that you are running the command on. The members are identified by URI and management port.
- If you are using search peer replication, you must re-add the search peers to one member. See Replicate the search peers across the cluster.
Runtime considerations | How authorization works in distributed searches |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2, 9.4.0
Feedback submitted, thanks!