Splunk® Enterprise

Distributed Search

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Restart the search head cluster

You can restart the entire cluster with the splunk rolling-restart command. The command performs a phased restart of all cluster members, so that the cluster as a whole can continue to perform its functions during the restart process.

The deployer also automatically initiates a rolling restart, when necessary, after distributing a configuration bundle to the members. For details on this process, see "Push the configuration bundle".

Caution: In most cases, when changing configuration settings in the [shclustering] stanza of server.conf, you must restart all members at approximately the same time, in order to maintain identical settings across all members. For this reason, do not use the splunk rolling-restart command to restart the members after such configuration changes, except when configuring the captain_is_adhoc_searchhead attribute. Instead, run the splunk restart command on each member. See "Configure the search head cluster".

Initiate a rolling restart

Invoke the splunk rolling-restart command from the captain:

splunk rolling-restart shcluster-members

Monitor the restart process

To check the progress of the rolling restart, run this variant of the splunk rolling-restart command from the captain:

splunk rolling-restart shcluster-members -status 1

The command returns the status of any members that have started or completed the restart process. For example:

Peer | Status | Start Time | End Time | GUID 
1. server-centos65x64-4 | RESTARTING | Mon Apr 20 11:52:21 2015 | N/A | 7F10190D-F00A-47AF-8688-8DD26F1A8A4D
2. server-centos65x64-3 | RESTART-COMPLETE | Mon Apr 20 11:51:54 2015 | Mon Apr 20 11:52:16 2015 | E78F5ECF-1EC0-4E51-9EF7-5939B793763C

You can run the splunk rolling-restart command multiple times during the restart process to view the current status of the restart. When the captain restarts at the end of the process, the command will fail. After the captain restarts, you can view the final restart status in the captain's audit.log file:

index=_audit rolling-restart

How rolling restart works

The rolling restart works like this: The captain issues a restart message to approximately 10%, by default, of the members at a time. Once those members restart and contact the captain, the captain then issues a restart message to another 10% of the members, and so on, until all the members, including the captain, have restarted.

Note: If there are fewer than 10 members in the cluster, the captain issues the restart to one member at a time.

The captain is the final member to restart. After the captain member restarts, it continues to function as the captain.

After all members have restarted, it requires approximately 60 seconds for the cluster to stabilize. During this interval, error messages might appear. You can ignore these messages. They should desist within 60 seconds.

Note: During a rolling restart, there is no guarantee that all knowledge objects will be available to all members.

Configure the number of members that restart simultaneously

By default, the captain issues the restart command to 10% of the members at a time. However, the percentage is configurable through the percent_peers_to_restart attribute in the [shclustering] stanza of server.conf. For convenience, you can configure this attribute with the CLI splunk edit shcluster-config command. For example, to change the restart behavior so that the captain restarts 20% of the peers at a time, use this command:

splunk edit shcluster-config -percent_peers_to_restart 20

Caution: Do not set the value to greater than 20%. Otherwise, issues can arise during the captain election process.

After changing the percent_peers_to_restart attribute, you still need to run the splunk rolling-restart command to initiate the actual restart.

Restart fails if cluster cannot maintain a majority

A cluster with a dynamic captain requires that a majority of members be running at all times. See "Captain election." This requirement extends to the rolling restart process.

If restarting the next set of members (governed by the percent_peers_to_restart attribute) would cause the number of active members to fall below 51% (for example, because some other members have failed), the restart process halts, in order to maintain an active majority of members. The captain then makes repeated attempts to restart the process, in case another member has rejoined the cluster in the interim. These attempts continue until the restart_timeout period elapses (by default, 10 minutes). At that point, the captain makes no more attempts, and the remaining members do not go through the rolling-restart process.

The restart_timeout attribute is settable in server.conf.

PREVIOUS
Use static captain to recover from loss of majority
  NEXT
Use the CLI to view information about a search head cluster

This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10


Comments

Interesting question came up in the comments of https://answers.splunk.com/answers/425988/what-is-the-best-practice-for-bringing-down-a-sear.html#comment-426456: Does a rolling restart of SHC members kill running searches or wait for them to finish running? Would it be possible to capture that answer here or a link to the general restart behavior (if its no different)?

SloshBurch
July 13, 2016

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters