Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

Download topic as PDF

What happens when the master node goes down

The master is essential to the proper running of an indexer cluster, with its role as coordinator for much of the cluster activity. However, if the master goes down, the peers and search head have default behaviors that allow them to function fairly normally, at least for a while. Nevertheless, you should treat a downed master as a serious failure.

To deal with the possibility of a downed master, you can configure a stand-by master that can take over if needed. For details, see "Replace the master node on the indexer cluster".

When a master goes down

If a master goes down, the cluster can continue to run as usual, as long as there are no other failures. Peers can continue to ingest data, stream copies to other peers, replicate buckets, and respond to search requests from the search head.

When a peer rolls a hot bucket, it normally contacts the master to get a list of target peers to stream its next hot bucket to. However, if a peer rolls a hot bucket while the master is down, it will just start streaming its next hot bucket to the same set of peers that it used as targets for the previous hot bucket.

Eventually, problems will begin to arise. For example, if a peer goes down and the master is still down, there will be no way to coordinate the necessary remedial bucket-fixing activity. Or if, for some reason, a peer is unable to connect with one of its target peers, it has no way of getting another target.

The search head can also continue to function without a master, although eventually the searches will be accessing incomplete sets of data. (For example, if a peer with primary bucket copies goes down, there's no way to transfer primacy to copies on other peers, so those buckets will no longer get searched.) The search head will use the last generation ID that it got before the master went down. It will display a warning if one or more peers in the last generation are down.

When the master comes back up

Peers continue to send heartbeats indefinitely, so that, when the master comes back up, they will be able to detect it and reconnect.

When the master comes back up, it waits for a quiet period of 60 seconds, so that all peers have an opportunity to re-register with it. Once the quiet period ends, the master has a complete view of the state of the cluster, including the state of peer nodes and buckets. Assuming that at least the replication factor number of peers have registered with it, the master initiates any necessary bucket-fixing activities to ensure that the cluster is valid and complete. In addition, it rebalances the cluster and updates the generation ID as needed.

Bucket fixing can take some time to complete, because it involves copying buckets and making non-searchable copies searchable. For help estimating the time needed to complete the bucket-fixing activity, look here.

After the 60 second quiet period is over, you can view the master dashboard for accurate information on the status of the cluster.

Note: You must make sure that at least replication factor number of peers are running when you restart the master.

PREVIOUS
What happens when a peer node comes back up
  NEXT
About SmartStore

This documentation applies to the following versions of Splunk® Enterprise: 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.3.0, 7.3.1, 7.3.2


Comments

Neerajgarg5 - <br /><br />Just restart it and wait about 60 seconds until the peers register with the master. They will provide the master with all up-to-date state information. If you need to bring up a backup master, see this topic: <br /><br />http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Handlemasternodefailure

Sgoodman
October 21, 2014

Neerajgarg5 - <br /><br />Just restart it and wait about 60 seconds until the peers register with the master. They will provide the master with all up-to-date state information. If you need to bring up a backup master, see this topic: <br /><br />http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Handlemasternodefailure

Sgoodman
October 21, 2014

How to bring back the master node once it goes down. Is there a series of steps need to be performed to bring it back to previous state.

Neerajgarg5
October 21, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters