Perform a rolling upgrade of an indexer cluster

Splunk Enterprise version 7.1.0 and higher supports rolling upgrade for indexer clusters. Rolling upgrade lets you perform a phased upgrade of indexer peers with minimal interruption to your ongoing searches. You can use rolling upgrade to minimize search disruption when upgrading peer nodes to a new version of Splunk Enterprise.

Requirements and considerations

Review the following requirements and considerations before you initiate a rolling upgrade:

Rolling upgrade only applies to upgrades from version 7.1.x to a higher version of Splunk Enterprise.
The manager node and all peer nodes must be running version 7.1.0 or higher. For upgrade instructions, see Upgrade an indexer cluster.
All search heads and search head clusters must be running version 7.1.0 or higher.
Do not attempt any clustering maintenance operations, such as rolling restart, bundle pushes, or node additions, during upgrade.

Hardware or network failures that prevent node shutdown or restart might require manual intervention.

How a rolling upgrade works

When you initiate a rolling upgrade, you select a peer and take it offline. During the offline process, the manager node reassigns bucket primaries to other peers to retain the searchable state, and the peer completes any in-progress searches within a configurable timeout. See The fast offline process.

After the manager node shuts down the peer, you perform the software upgrade and bring the peer back online, at which point the peer rejoins the cluster. You repeat this process for each peer node until the rolling upgrade is complete.

A rolling upgrade behaves in the following ways:

Peer upgrades occur one at a time based on the default search factor of 2. With a search factor of 3 or greater, you can upgrade (search factor - 1) number of peers at a time. For example, with a search factor of 3 you can upgrade 2 peers at a time. The number of peers you can upgrade simultaneously is the same for both single-site and multisite clusters, as the guidance for multisite clusters is to upgrade one site at a time. So for a multisite cluster, with a search factor of 3, you can upgrade 2 peers at a time in the same site.
The peer waits for any in-progress searches to complete. It will wait up to a maximum time period determined by the decommission_search_jobs_wait_secs attribute in server.conf. The default of 180 seconds is enough time for the majority of searches to complete in most cases.
Rolling upgrades apply to both historical searches and real-time searches.

In-progress searches that take longer than the default 180 seconds might generate incomplete results and a corresponding error message. If you have a scheduled search that must complete, either increase the decommission_search_jobs_wait_secs value or do not perform a rolling upgrade within the search's timeframe.

Before you perform a rolling upgrade, make sure the search_retry attribute in the [search] stanza of limits.conf is set to false (the default). Setting this to true might cause searches that take longer than the decommission_search_jobs_wait_secs value to generate duplicate or partial results without an error message.

Scheduled search behavior during rolling upgrade

As of version 8.2.x, by default, during rolling upgrade, continuous scheduled searches continue to run and are not deferred, based on the defer_scheduled_searchable_idxc = <boolean> setting in savedsearches.conf, which now defaults to a value of "false".

In some cases, continuous scheduled searches can return partial results during rolling upgrade. If necessary, you can defer continuous scheduled searches until after rolling upgrade completes, as follows:

On the search head or search head cluster, edit SPLUNK_HOME/etc/system/local/savedsearches.conf.
Set defer_scheduled_searchable_idxc = true.
Restart Splunk.

Also as of version 8.2.x, by default, during rolling upgrade, real-time scheduled searches continue to run and are not deferred, based on the newly added skip_scheduled_realtime_idxc = <boolean> setting in savedsearches.conf, which defaults to a value of "false".

For more information on defer_scheduled_searchable_idxc and skip_scheduled_realtime_idxc settings, see savedsearches.conf in the Admin Manual.

For information on real-time and continuous scheduled search modes, see Real-time scheduling and continuous scheduling.

Data model acceleration searches are skipped during rolling upgrade.

Perform a rolling upgrade

To upgrade an indexer cluster with minimal search interruption, perform the following steps:

1. Run preliminary health checks

On the manager node, run the splunk show cluster-status command with the verbose option to confirm the cluster is in a searchable state:

splunk show cluster-status --verbose

This command shows information about the cluster state. Review the command output to confirm that the search factor is met and all data is searchable before you initiate the rolling upgrade.

The cluster must have two searchable copies of each bucket to be in a searchable state for a rolling upgrade.

Here is an example of the output from the splunk show cluster-status --verbose command:

splunk@manager1:~/bin$ ./splunk show cluster-status --verbose

Pre-flight check successful .................. YES
 ├────── Replication factor met ............... YES
 ├────── Search factor met .................... YES
 ├────── All data is searchable ............... YES
 ├────── All peers are up ..................... YES
 ├────── CM version is compatible ............. YES
 ├────── No fixup tasks in progress ........... YES
 └────── Splunk version peer count { 7.1.0: 3 }

 Indexing Ready YES

 idx1 	 0026D1C6-4DDB-429E-8EC6-772C5B4F1DB5	 default
	 Searchable YES
	 Status  Up
	 Bucket Count=14
	 Splunk Version=7.1.0

 idx3 	 31E6BE71-20E1-4F1C-8693-BEF482375A3F	 default
	 Searchable YES
	 Status  Up
	 Bucket Count=14
	 Splunk Version=7.1.0

 idx2 	 81E52D67-6AC6-4C5B-A528-4CD5FEF08009	 default
	 Searchable YES
	 Status  Up
	 Bucket Count=14
	 Splunk Version=7.1.0

The output shows that the health check is successful, which indicates the cluster is in a searchable state to perform a rolling upgrade.

For information on health check criteria, see Health check output details.

Health checks do not cover all potential cluster health issues. The checks apply only to the criteria listed.

Or, send a GET request to the following endpoint to monitor cluster health:

cluster/manager/health

If the endpoint output shows pre_flight_check: 1, then the health is successful.

For endpoint details, see cluster/manager/health in the REST API Reference Manual.

2. Upgrade the manager node

Stop the manager node.
Upgrade the manager node, following Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.
Start the manager node and accept all prompts, if it is not already running.

You can use the manager node dashboard to verify that all cluster nodes are up and running. See View the manager node dashboard.

3. Upgrade the search head tier

If the search head tier consists of independent search heads, follow this procedure:

Stop all the search heads.
Upgrade the search heads, following the normal procedure for any Splunk Enterprise upgrade, as described in How to upgrade Splunk Enterprise in the Installation Manual.
Start the search heads, if they are not already running.

If the search head tier consists of a search head cluster, follow the procedure in Upgrade a search head cluster.

4. Initialize rolling upgrade

Run the following CLI command on the manager node:

splunk upgrade-init cluster-peers

Or, send a POST request to the following endpoint:

cluster/manager/control/control/rolling_upgrade_init

This initializes the rolling upgrade and puts the cluster in maintenance mode.

For endpoint details, see cluster/manager/control/control/rolling_upgrade_init in the REST API Reference Manual.

5. Take the peer offline

Taking multiple peers offline simultaneously can impact searches.

Run the following CLI command on the peer node:

splunk offline

Or, send a POST request to the following endpoint.

cluster/peer/control/control/decommission

The manager node reassigns bucket primaries, completes any ongoing searches, and then shuts down the peer.

For endpoint details, see cluster/peer/control/control/decommission in the REST API Reference Manual.

(Optional) Monitor peer status

To monitor the status of the offline process, send a GET request to the following enpoint:

cluster/manager/peers/<peer-GUID>

If the response shows "ReassigningPrimaries", the peer is not yet shut down.

For endpoint details, see cluster/manager/peers/{name} in the REST API Reference Manual.

6. Upgrade the peer node

Upgrade the peer node, following standard Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.

7. Bring the peer online

Run the following command on the peer node.

splunk start

The peer node starts and automatically rejoins the cluster.

8. Validate version upgrade

Validate the version upgrade using the following endpoint:

cluster/manager/peers/<peer-GUID>

For endpoint details, see cluster/manager/peers/{name} in the REST API Reference Manual.

9. Repeat steps 5-8

Repeat steps 5-8 until upgrade of all peer nodes is complete.

10. Finalize rolling upgrade

Run the following CLI command on the manager node:

splunk upgrade-finalize cluster-peers

Or, send a POST request to the following endpoint:

cluster/manager/control/control/rolling_upgrade_finalize

This completes the upgrade process and takes the cluster out of maintenance mode.

For endpoint details, see cluster/manager/control/control/rolling_upgrade_finalize in the REST API Reference Manual.

Conflicting operations

You cannot run certain operations simultaneously:

Data rebalance
Excess bucket removal
Rolling restart
Rolling upgrade

If you trigger one of these operations while another one is already running, splunkd.log, the CLI, and Splunk Web all surface an error that shows a conflicting operation is in progress.

Related answers from Splunk Community

Perform a rolling upgrade of an indexer cluster

Requirements and considerations

How a rolling upgrade works

Scheduled search behavior during rolling upgrade

Perform a rolling upgrade

1. Run preliminary health checks

2. Upgrade the manager node

3. Upgrade the search head tier

4. Initialize rolling upgrade

5. Take the peer offline

(Optional) Monitor peer status

6. Upgrade the peer node

7. Bring the peer online

8. Validate version upgrade

9. Repeat steps 5-8

10. Finalize rolling upgrade

Conflicting operations

Comments

Perform a rolling upgrade of an indexer cluster

Was this topic useful?