Perform a rolling upgrade of an indexer cluster
Splunk Enterprise version 7.1.0 and higher supports rolling upgrade for indexer clusters. Rolling upgrade lets you perform a phased upgrade of indexer peers with minimal interruption to your ongoing searches. You can use rolling upgrade to minimize search disruption when upgrading peer nodes to a new version of Splunk Enterprise.
Requirements and considerations
Review the following requirements and considerations before you initiate a rolling upgrade:
- Rolling upgrade only applies to upgrades from version 7.1.x to a higher version of Splunk Enterprise.
- The cluster master and all peer nodes must be running version 7.1.0 or higher. For upgrade instructions, see Upgrade an indexer cluster.
- All search heads and search head clusters must be running version 7.1.0 or higher.
- Do not attempt any clustering maintenance operations, such as rolling restart, bundle pushes, or node additions, during upgrade.
Hardware or network failures that prevent node shutdown or restart might require manual intervention.
How a rolling upgrade works
When you initiate a rolling upgrade, you select a peer and take it offline. During the offline process, the master reassigns bucket primaries to other peers to retain the searchable state, and the peer completes any in-progress searches within a configurable timeout. See The fast offline process.
After the master shuts down the peer, you perform the software upgrade and bring the peer back online, at which point the peer rejoins the cluster. You repeat this process for each peer node until the rolling upgrade is complete.
A rolling upgrade behaves in the following ways:
- Peer upgrades occur one at a time based on the default search factor of SF=2. With SF=3 or greater, you can upgrade the search factor minus one number of peers at a time. For example, with SF=3 you can upgrade two peers at a time. The number of peers you can upgrade simultaneously is the same for both single-site and multisite clusters, as the guidance for multisite clusters is to upgrade one site at a time. So for a multisite cluster, when SF=3, you can upgrade 2 peers at a time in the same site.
- The peer waits for any in-progress searches to complete. It will wait up to a maximum time period determined by the
decommission_search_jobs_wait_secs
attribute inserver.conf
. The default of 180 seconds is enough time for the majority of searches to complete in most cases. - Rolling upgrades apply to both historical searches and real-time searches.
In-progress searches that take longer than the default 180 seconds might generate incomplete results and a corresponding error message. If you have a scheduled search that must complete, either increase the decommission_search_jobs_wait_secs
value or do not perform a rolling upgrade within the search's timeframe.
Before you perform a rolling upgrade, make sure the search_retry
attribute in the [search]
stanza of limits.conf
is set to false
(the default). Setting this to true
might cause searches that take longer than the decommission_search_jobs_wait_secs
value to generate duplicate or partial results without an error message.
Disable deferred scheduled searches
By default, during rolling upgrade, continuous scheduled searches are deferred until after the upgrade is complete, based on the default defer_scheduled_searchable_idxc
attribute in savedsearches.conf
. Real-time scheduled searches are deferred regardless of this setting.
You can disable this default behavior so that continuous scheduled searches are not deferred, as follows:
- On the search head, edit
$SPLUNK_HOME/etc/system/local/savedsearches.conf
. - Set
defer_scheduled_searchable_idxc
to false.[default] defer_scheduled_searchable_idxc = false
- Restart Splunk.
When defer_scheduled_searchable_idxc
is disabled, scheduled saved searches might return partial results.
For more information on defer_scheduled_searchable_idxc
, see savedsearches.conf in the Admin Manual.
For information on real-time and continuous scheduled searches, see Real-time scheduling and continuous scheduling.
Perform a rolling upgrade
To upgrade an indexer cluster with minimal search interruption, perform the following steps:
1. Run preliminary health checks
On the master, run the splunk show cluster-status
command with the verbose
option to confirm the cluster is in a searchable state:
splunk show cluster-status --verbose
This command shows information about the cluster state. Review the command output to confirm that the search factor is met and all data is searchable before you initiate the rolling upgrade.
The cluster must have two searchable copies of each bucket to be in a searchable state for a rolling upgrade.
Here is an example of the output from the splunk show cluster-status --verbose
command:
splunk@master1:~/bin$ ./splunk show cluster-status --verbose Pre-flight check successful .................. YES ├────── Replication factor met ............... YES ├────── Search factor met .................... YES ├────── All data is searchable ............... YES ├────── All peers are up ..................... YES ├────── CM version is compatible ............. YES ├────── No fixup tasks in progress ........... YES └────── Splunk version peer count { 7.1.0: 3 } Indexing Ready YES idx1 0026D1C6-4DDB-429E-8EC6-772C5B4F1DB5 default Searchable YES Status Up Bucket Count=14 Splunk Version=7.1.0 idx3 31E6BE71-20E1-4F1C-8693-BEF482375A3F default Searchable YES Status Up Bucket Count=14 Splunk Version=7.1.0 idx2 81E52D67-6AC6-4C5B-A528-4CD5FEF08009 default Searchable YES Status Up Bucket Count=14 Splunk Version=7.1.0
The output shows that the health check is successful, which indicates the cluster is in a searchable state to perform a rolling upgrade.
For information on health check criteria, see Health check output details.
Health checks do not cover all potential cluster health issues. The checks apply only to the criteria listed.
Or, send a GET request to the following endpoint to monitor cluster health:
cluster/master/health
If the endpoint output shows pre_flight_check: 1
, then the health is successful.
For endpoint details, see cluster/master/health in the REST API Reference Manual.
2. Upgrade the cluster master
- Stop the cluster master.
- Upgrade the cluster master, following Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.
- Start the cluster master and accept all prompts, if it is not already running.
You can use the cluster master dashboard to verify that all cluster nodes are up and running. See View the master dashboard.
3. Upgrade the search head tier
If the search head tier consists of independent search heads, follow this procedure:
- Stop all the search heads.
- Upgrade the search heads, following the normal procedure for any Splunk Enterprise upgrade, as described in How to upgrade Splunk Enterprise in the Installation Manual.
- Start the search heads, if they are not already running.
If the search head tier consists of a search head cluster, follow the procedure in Upgrade a search head cluster.
4. Initialize rolling upgrade
Run the following CLI command on the cluster master:
splunk upgrade-init cluster-peers
Or, send a POST request to the following endpoint:
cluster/master/control/control/rolling_upgrade_init
This initializes the rolling upgrade and puts the cluster in maintenance mode.
For endpoint details, see cluster/master/control/control/rolling_upgrade_init in the REST API Reference Manual.
5. Take the peer offline
Taking multiple peers offline simultaneously can impact searches.
Run the following CLI command on the peer node:
splunk offline
Or, send a POST request to the following endpoint.
cluster/slave/control/control/decommission
The master reassigns bucket primaries, completes any ongoing searches, and then shuts down the peer.
For endpoint details, see cluster/slave/control/control/decommission in the REST API Reference Manual.
(Optional) Monitor peer status
To monitor the status of the offline process, send a GET request to the following enpoint:
cluster/master/peers/<peer-GUID>
If the response shows "ReassigningPrimaries", the peer is not yet shut down.
For endpoint details, see cluster/master/peers/{name} in the REST API Reference Manual.
6. Upgrade the peer node
Upgrade the peer node, following standard Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.
7. Bring the peer online
Run the following command on the peer node.
splunk start
The peer node starts and automatically rejoins the cluster.
8. Validate version upgrade
Validate the version upgrade using the following endpoint:
cluster/master/peers/<peer-GUID>
For endpoint details, see cluster/master/peers/{name} in the REST API Reference Manual.
9. Repeat steps 5-8
Repeat steps 5-8 until upgrade of all peer nodes is complete.
10. Finalize rolling upgrade
Run the following CLI command on the cluster master:
splunk upgrade-finalize cluster-peers
Or, send a POST request to the following endpoint:
cluster/master/control/control/rolling_upgrade_finalize
This completes the upgrade process and takes the cluster out of maintenance mode.
For endpoint details, see cluster/master/control/control/rolling_upgrade_finalize in the REST API Reference Manual.
Conflicting operations
You cannot run certain operations simultaneously:
- Data rebalance
- Excess bucket removal
- Rolling restart
- Rolling upgrade
If you trigger one of these operations while another one is already running, splunkd.log
, the CLI, and Splunk Web all surface an error that shows a conflicting operation is in progress.
Upgrade an indexer cluster | Ways to get data into an indexer cluster |
This documentation applies to the following versions of Splunk® Enterprise: 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10
Feedback submitted, thanks!