Perform a rolling upgrade of an indexer cluster

Splunk Enterprise version 7.1.0 and higher supports rolling upgrade for indexer clusters. Rolling upgrade lets you perform a phased upgrade of indexer peers with minimal interruption to your ongoing searches. You can use rolling upgrade to minimize search disruption when upgrading peer nodes to a new version of Splunk Enterprise.

Requirements and considerations

Review the following requirements and considerations before you initiate a rolling upgrade:

Rolling upgrade only applies to upgrades from version 7.1.x to a higher version of Splunk Enterprise.
The cluster master and all peer nodes must be running version 7.1.0 or higher. For upgrade instructions, see Upgrade an indexer cluster.
All search heads and search head clusters must be running version 7.1.0 or higher.
Do not attempt any clustering maintenance operations, such as rolling restart, bundle pushes, or node additions, during upgrade.

Hardware or network failures that prevent node shutdown or restart might require manual intervention.

How a rolling upgrade works

When you initiate a rolling upgrade, you select a peer and take it offline. During the offline process, the master reassigns bucket primaries to other peers to retain the searchable state, and the peer completes any in-progress searches within a configurable timeout. See The fast offline process.

After the master shuts down the peer, you perform the software upgrade and bring the peer back online, at which point the peer rejoins the cluster. You repeat this process for each peer node until the rolling upgrade is complete.

A rolling upgrade behaves in the following ways:

Peer upgrades occur one at a time based on the default search factor of SF=2. With SF=3 or greater, you can upgrade the search factor minus one number of peers at a time. For example, with SF=3 you can upgrade two peers at a time. The number of peers you can upgrade simultaneously is the same for both single-site and multisite clusters, as the guidance for multisite clusters is to upgrade one site at a time. So for a multisite cluster, when SF=3, you can upgrade 2 peers at a time in the same site.
The peer waits for any in-progress searches to complete. It will wait up to a maximum time period determined by the decommission_search_jobs_wait_secs attribute in server.conf. The default of 180 seconds is enough time for the majority of searches to complete in most cases.
Rolling upgrades apply to both historical searches and real-time searches.

In-progress searches that take longer than the default 180 seconds might generate incomplete results and a corresponding error message. If you have a scheduled search that must complete, either increase the decommission_search_jobs_wait_secs value or do not perform a rolling upgrade within the search's timeframe.

Before you perform a rolling upgrade, make sure the search_retry attribute in the [search] stanza of limits.conf is set to false (the default). Setting this to true might cause searches that take longer than the decommission_search_jobs_wait_secs value to generate duplicate or partial results without an error message.

Disable deferred scheduled searches

By default, during rolling upgrade, continuous scheduled searches are deferred until after the upgrade is complete, based on the default defer_scheduled_searchable_idxc attribute in savedsearches.conf. Real-time scheduled searches are deferred regardless of this setting.

You can disable this default behavior so that continuous scheduled searches are not deferred, as follows:

On the search head, edit $SPLUNK_HOME/etc/system/local/savedsearches.conf.

Set defer_scheduled_searchable_idxc to false.

[default]
defer_scheduled_searchable_idxc = false

Restart Splunk.

When defer_scheduled_searchable_idxc is disabled, scheduled saved searches might return partial results.

For more information on defer_scheduled_searchable_idxc, see savedsearches.conf in the Admin Manual.

For information on real-time and continuous scheduled searches, see Real-time scheduling and continuous scheduling.

Perform a rolling upgrade

To upgrade an indexer cluster with minimal search interruption, perform the following steps:

1. Run preliminary health checks

On the master, run the splunk show cluster-status command with the verbose option to confirm the cluster is in a searchable state:

splunk show cluster-status --verbose

This command shows information about the cluster state. Review the command output to confirm that the search factor is met and all data is searchable before you initiate the rolling upgrade.

The cluster must have two searchable copies of each bucket to be in a searchable state for a rolling upgrade.

Here is an example of the output from the splunk show cluster-status --verbose command:

splunk@master1:~/bin$ ./splunk show cluster-status --verbose

Pre-flight check successful .................. YES
 ├────── Replication factor met ............... YES
 ├────── Search factor met .................... YES
 ├────── All data is searchable ............... YES
 ├────── All peers are up ..................... YES
 ├────── CM version is compatible ............. YES
 ├────── No fixup tasks in progress ........... YES
 └────── Splunk version peer count { 7.1.0: 3 }

 Indexing Ready YES

 idx1 	 0026D1C6-4DDB-429E-8EC6-772C5B4F1DB5	 default
	 Searchable YES
	 Status  Up
	 Bucket Count=14
	 Splunk Version=7.1.0

 idx3 	 31E6BE71-20E1-4F1C-8693-BEF482375A3F	 default
	 Searchable YES
	 Status  Up
	 Bucket Count=14
	 Splunk Version=7.1.0

 idx2 	 81E52D67-6AC6-4C5B-A528-4CD5FEF08009	 default
	 Searchable YES
	 Status  Up
	 Bucket Count=14
	 Splunk Version=7.1.0

The output shows that the health check is successful, which indicates the cluster is in a searchable state to perform a rolling upgrade.

For information on health check criteria, see Health check output details.

Health checks do not cover all potential cluster health issues. The checks apply only to the criteria listed.

Or, send a GET request to the following endpoint to monitor cluster health:

cluster/master/health

If the endpoint output shows pre_flight_check: 1, then the health is successful.

For endpoint details, see cluster/master/health in the REST API Reference Manual.

2. Upgrade the cluster master

Stop the cluster master.
Upgrade the cluster master, following Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.
Start the cluster master and accept all prompts, if it is not already running.

You can use the cluster master dashboard to verify that all cluster nodes are up and running. See View the master dashboard.

3. Upgrade the search head tier

If the search head tier consists of independent search heads, follow this procedure:

Stop all the search heads.
Upgrade the search heads, following the normal procedure for any Splunk Enterprise upgrade, as described in How to upgrade Splunk Enterprise in the Installation Manual.
Start the search heads, if they are not already running.

If the search head tier consists of a search head cluster, follow the procedure in Upgrade a search head cluster.

4. Initialize rolling upgrade

Run the following CLI command on the cluster master:

splunk upgrade-init cluster-peers

Or, send a POST request to the following endpoint:

cluster/master/control/control/rolling_upgrade_init

This initializes the rolling upgrade and puts the cluster in maintenance mode.

For endpoint details, see cluster/master/control/control/rolling_upgrade_init in the REST API Reference Manual.

5. Take the peer offline

Taking multiple peers offline simultaneously can impact searches.

Run the following CLI command on the peer node:

splunk offline

Or, send a POST request to the following endpoint.

cluster/slave/control/control/decommission

The master reassigns bucket primaries, completes any ongoing searches, and then shuts down the peer.

For endpoint details, see cluster/slave/control/control/decommission in the REST API Reference Manual.

(Optional) Monitor peer status

To monitor the status of the offline process, send a GET request to the following enpoint:

cluster/master/peers/<peer-GUID>

If the response shows "ReassigningPrimaries", the peer is not yet shut down.

For endpoint details, see cluster/master/peers/{name} in the REST API Reference Manual.

6. Upgrade the peer node

Upgrade the peer node, following standard Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.

7. Bring the peer online

Run the following command on the peer node.

splunk start

The peer node starts and automatically rejoins the cluster.

8. Validate version upgrade

Validate the version upgrade using the following endpoint:

cluster/master/peers/<peer-GUID>

For endpoint details, see cluster/master/peers/{name} in the REST API Reference Manual.

9. Repeat steps 5-8

Repeat steps 5-8 until upgrade of all peer nodes is complete.

10. Finalize rolling upgrade

Run the following CLI command on the cluster master:

splunk upgrade-finalize cluster-peers

Or, send a POST request to the following endpoint:

cluster/master/control/control/rolling_upgrade_finalize

This completes the upgrade process and takes the cluster out of maintenance mode.

For endpoint details, see cluster/master/control/control/rolling_upgrade_finalize in the REST API Reference Manual.

Conflicting operations

You cannot run certain operations simultaneously:

Data rebalance
Excess bucket removal
Rolling restart
Rolling upgrade

If you trigger one of these operations while another one is already running, splunkd.log, the CLI, and Splunk Web all surface an error that shows a conflicting operation is in progress.

Related answers from Splunk Community

Perform a rolling upgrade of an indexer cluster

Requirements and considerations

How a rolling upgrade works

Disable deferred scheduled searches

Perform a rolling upgrade

1. Run preliminary health checks

2. Upgrade the cluster master

3. Upgrade the search head tier

4. Initialize rolling upgrade

5. Take the peer offline

(Optional) Monitor peer status

6. Upgrade the peer node

7. Bring the peer online

8. Validate version upgrade

9. Repeat steps 5-8

10. Finalize rolling upgrade

Conflicting operations

Comments

Perform a rolling upgrade of an indexer cluster

Was this topic useful?