Handle manager site failure

If the site holding the manager node fails, you lose the manager's functionality. You must immediately start a new manager on one of the remaining sites.

Until the new manager starts up, the cluster continues to function as best it can. The peers continue to stream data to other peers based on the list of target peers that they were using at the time the manager went down. If some of their target peers go down (as would likely be the case in a site failure), they remove them from their lists of streaming targets and continue to stream data to any peers remaining on their lists.

To deal with manager site failure, do the following:

1. Configure a stand-by manager on at least one of the sites not hosting the current manager. See Replace the manager node on the indexer cluster. This is a preparatory step. You must do this before the need arises.

2. When the manager site goes down, bring up a stand-by manager on one of the remaining sites. See Replace the manager node on the indexer cluster.

3. Restart indexing on the cluster, following the instructions in Restart indexing in multisite cluster after manager restart or site failure.

The new manager now fully replaces the old manager.

Note: If the failed site later comes back up, you need to point the peers on that site to the new manager. See Ensure that the peer and search head nodes can find the new manager.

Related answers from Splunk Community

Handle manager site failure

Comments

Was this topic useful?