Splunk® Enterprise

Distributed Search

Download manual as PDF

Download topic as PDF

Handle failure of a search head cluster member

When a member fails, the cluster can usually absorb the failure and continue to function normally.

When a failed member restarts and rejoins the cluster, the cluster can frequently complete the process automatically. In some cases, however, your intervention is necessary.

When a member fails

If a search head cluster member fails for any reason and leaves the cluster unexpectedly, the cluster can usually continue to function without interruption:

  • The cluster's high availability features ensure that the cluster can continue to function as long as a majority (at least 51%) of the members are still running. For example, if you have a cluster configured with seven members, the cluster will function as long as four or more members remain up. If a majority of members fail, the cluster cannot successfully elect a new captain, which results in failure of the entire cluster. See Search head cluster captain.
  • All search artifacts resident on the failed member remain available through other search heads, as long as the number of machines that fail is less than the replication factor. If the number of failed members equals or exceeds the replication factor, it is likely that some search artifacts will no longer be available to the remaining members.
  • If the failed member was serving as captain, the remaining nodes elect another member as captain. Since members share configurations, the new captain is immediately fully functional.
  • If you are employing a load balancer in front of the search heads, the load balancer should automatically reroute users on the failed member to an available search head.

When the member rejoins the cluster

A failed member automatically rejoins the cluster, if its instance successfully restarts. When this occurs, its configurations require immediate updating so that they match those of the other cluster members. The member needs updates for two sets of configurations:

See How configuration changes propagate across the search head cluster for information on how configurations are shared among cluster members.

Updating the replicated changes

When the member rejoins the cluster, it contacts the captain to request the set of intervening replicated changes. In some cases, the recovering member can automatically resync with the captain. However, if the member has been disconnected from the cluster for a long time, the resync process might require manual intervention.

See Replication synchronization issues for details on the recovery synchronization process, including how to perform a manual resync.

Updating the deployed changes

When the member rejoins the cluster, it automatically contacts the deployer for the latest configuration bundle. The member then applies any changes or additions that have been made since it last downloaded the bundle.

See Use the deployer to distribute apps and configuration updates.

PREVIOUS
Control captaincy
  NEXT
Use static captain to recover from loss of majority

This documentation applies to the following versions of Splunk® Enterprise: 6.5.1612 (Splunk Cloud only), 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.3.0


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters