Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

What happens when a master node goes down

The master is essential to the proper running of a cluster, with its role as coordinator for much of the cluster activity. However, if the master goes down, the peers and search head have default behaviors that allow them to function fairly normally, at least for a while. Nevertheless, you should treat a downed master as a serious failure.

To deal with the possibility of a downed master, you can configure a stand-by master that can take over if needed. For details, see "Configure a stand-by master".

When a master goes down

If a master goes down, the cluster can continue to run as usual, as long as there are no other failures. Peers can continue to ingest data, stream copies to other peers, replicate buckets, and respond to search requests from the search head.

When a peer rolls a hot bucket, it normally contacts the master to get a list of target peers to stream its next hot bucket to. However, if a peer rolls a hot bucket while the master is down, it will just start streaming its next hot bucket to the same set of peers that it used as targets for the previous hot bucket.

Eventually, problems will begin to arise. For example, if a peer goes down and the master is still down, there will be no way to coordinate the necessary remedial bucket-fixing activity. Or if, for some reason, a peer is unable to connect with one of its target peers, it has no way of getting another target.

The search head can also continue to function without a master, although eventually the searches will be accessing incomplete sets of data. (For example, if a peer with primary bucket copies goes down, there's no way to transfer primacy to copies on other peers, so those buckets will no longer get searched.) The search head will use the last generation ID that it got before the master went down. It will display a warning if one or more peers in the last generation are down.

When the master comes back up

Peers continue to send heartbeats indefinitely, so that, when the master comes back up, they will be able to detect it and reconnect.

When the master comes back up, it waits for a quiet period of 60 seconds, so that all peers have an opportunity to re-register with it. Once the quiet period ends, the master has a complete view of the state of the cluster, including the state of peer nodes and buckets. Assuming that at least the replication factor number of peers have registered with it, the master will initiate any necessary bucket-fixing activities to ensure that the cluster is valid and complete. In addition, it will rebalance the cluster and update the generation ID as needed.

Bucket fixing can take some time to complete, because it involves copying buckets and making non-searchable copies searchable. For help estimating the time needed to complete the bucket-fixing activity, look here.

After the 60 second quiet period is over, you can view the master dashboard for accurate information on the status of the cluster.

Note: You must make sure there are at least replication factor number of peers running when you restart the master.

PREVIOUS
What happens when a peer node comes back up
  NEXT
Bucket replication issues

This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters