Handle master site failure

If the site holding the master node fails, you lose the master's functionality. You must immediately start a new master on one of the remaining sites.

Until the new master starts up, the cluster continues to function as best it can. The peers continue to stream data to other peers based on the list of target peers that they were using at the time the master went down. If some of their target peers go down (as would likely be the case in a site failure), they remove them from their lists of streaming targets and continue to stream data to any peers remaining on their lists.

To deal with master site failure, do the following:

1. Configure a stand-by master on at least one of the sites not hosting the current master. See Replace the master node on the indexer cluster. This is a preparatory step. You must do this before the need arises.

2. When the master site goes down, bring up a stand-by master on one of the remaining sites. See Replace the master node on the indexer cluster.

3. Restart indexing on the cluster, following the instructions in Restart indexing in multisite cluster after master restart or site failure.

The new master now fully replaces the old master.

Note: If the failed site later comes back up, you need to point the peers on that site to the new master. See Ensure that the peer and search head nodes can find the new master.

Related answers from Splunk Community

Handle master site failure

Comments

Was this topic useful?