Perform an automated rolling upgrade of a search head cluster
Splunk Enterprise version 9.1.0 and higher supports automated rolling upgrade of search head clusters using the custom splunk-rolling-upgrade app, which comes with the Splunk Enterprise product. A rolling upgrade performs a phased upgrade of cluster members to a new version of Splunk Enterprise with minimal interruption of ongoing searches. The splunk-rolling-upgrade app automates the manual rolling upgrade steps described in Perform a rolling upgrade of a search head cluster.
Requirements and considerations
Review the following requirements and considerations before you configure and initiate an automated rolling upgrade.
- The splunk-rolling-upgrade app requires Linux OS. Mac OS and Windows are not supported.
- Automated rolling upgrade only applies to upgrade from version 9.1.x and higher to subsequent versions of Splunk Enterprise. To determine your upgrade path and confirm the compatibility of the upgraded search head cluster version with existing Splunk Enterprise components and applications, see the Splunk products version compatibility matrix.
- Automated upgrade of Splunk Enterprise instances running under systemd is not supported.
- Automated search head cluster rolling upgrade supports the following installation package formats:
- .tgz: default file format
- .deb and .rpm: requires a custom script that can run with elevated privileges. See Create a custom installation hook.
- To use the splunk-rolling-upgrade app, you must create the splunk_system_upgrader role and assign it to you. The role must contain these capabilities:
- upgrade_splunk_shc
- list_search_head_clustering
- list_settings
- use_remote_proxy
The admin role contains all of the capabilities required by default. However, to limit access, it is a best practice to create a dedicated splunk_system_upgrader role / user with only the capabilities required to run the rolling upgrade.
How an automated rolling upgrade works
Use the splunk-rolling-upgrade app to perform an automated rolling upgrade of a search head cluster. You initiate the rolling upgrade with a single request to a REST endpoint or by specifying the corresponding CLI command. The app then downloads or gets a new Splunk Enterprise install package and installs it on each cluster member one by one. To learn more about the package_path
setting, see Configure the rolling upgrade app. By default, the app handles only .tgz packages by unpacking the contents in the $SPLUNK_HOME directory, which is typically /opt/splunk.
For more flexibility with installation, the splunk-rolling-upgrade app does the following:
- Implements the package installation process as a custom hook (shell script). You can write and plug in the installation logic, which is required for deb and rpm package types.
- Provides additional separate endpoints for monitoring the upgrade process and remediating failures.
The splunk-rolling-upgrade app provides the following REST endpoints and corresponding CLI commands to perform an automated search head cluster rolling upgrade.
- For cluster upgrade, you can run these operations on any cluster member.
- For deployer upgrade, you must run these operation on the deployer.
- For non-clustered upgrade, which means upgrading search heads that are not a part of a search head cluster, you must run these operations on each single search head.
REST endpoint | CLI command | Description |
---|---|---|
upgrade/shc/upgrade
|
splunk rolling-upgrade shc-upgrade
|
Initiate the automated rolling upgrade process. |
upgrade/shc/status
|
splunk rolling-upgrade shc-status
|
Monitor automated rolling upgrade status. |
upgrade/shc/recovery
|
splunk rolling-upgrade shc-recovery
|
Return the cluster to a ready state after automated rolling upgrade failure. |
Perform an automated rolling upgrade
This section shows you how to configure and use the splunk-rolling-upgrade app to run an automated rolling upgrade of a search head cluster.
Configure the rolling upgrade app
Before you can run an automated rolling upgrade, create and configure the splunk-rolling-upgrade app for search head upgrades and distribute it to search head peers. For non-clustered search heads, create and configure the app for each non-clustered search head without distributing it.
The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.
To configure the splunk-rolling-upgrade app, take the following steps:
For a non-clustered environment, perform steps 1 to 4 on the search head. The remaining steps are about distributing the app, which applies to search head clusters.
- On the deployer, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new rolling_upgrade.conf file containing the following contents, where
package_path
points to the installation package for the version to which you are upgrading:[downloader] package_path = <path to a package>
package_path
setting supports URI paths to local files, for example file://path/to/package.tgz, and remote links that require no authentication. - In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create an authorize.conf file containing the following:
[role_splunk_system_upgrader] upgrade_splunk_shc = enabled list_search_head_clustering = enabled list_settings = enabled use_remote_proxy = enabled
- On the deployer, in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new file called inputs.conf, containing the following scripted input, where
<splunk_user>
is the name of the user the app uses to send requests to REST endpoints.[script://$SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/bin/complete.py] passAuth=<splunk_user>
- (Optional) If you plan to use rpm or deb packages, follow these steps:
- Run the
chmod +x
command to set execution permissions for the associated hook (script) that you wrote. - Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/default directory.
- Copy your hook the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/default directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the
hook
stanza, set theinstall_script_path
value to the location of the hook. For example:[hook] install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
Note: The
install_script_path
setting supports only local paths and environment variable expansions.
- Run the
- On the deployer, copy $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config to the configuration bundle under SPLUNK_HOME/etc/shcluster/apps.
- On the deployer, distribute the configuration bundle to all search head cluster members using the following command:
splunk apply shcluster-bundle -target <uri-to-shc-peer>:<management port> -auth admin:<password>
For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.
Upgrade the license manager
The license manager is the first node to be upgraded in a Splunk deployment. From the deployment perspective, upgrading the license manager is similar to upgrading a non-clustered search head. After configuring the splunk-rolling-upgrade app, follow these steps to upgrade the license manager:
- To initiate the upgrade, sent a request to the REST endpoint
upgrade/standalone
or specify the CLI commandsplunk rolling-upgrade standalone
. - Monitor the status by sending requests to the REST
endpoint upgrade/cluster/status
or by specifying the CLI commandsplunk rolling-upgrade cluster-status
.
Run the automated rolling upgrade
After you configure the splunk-rolling-upgrade app, follow these steps to run the automated rolling upgrade of your search head cluster, using the REST API or corresponding CLI commands:
CLI commands for automated rolling upgrade do not return error messages.
- Identify the URI and management port of any search head cluster member.
- On any cluster member, send an HTTP POST request to the
upgrade/shc/upgrade
endpoint to initiate the rolling upgrade process. For example:curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
First, the request triggers basic health checks to ensure the search head cluster is in a healthy state to perform the rolling upgrade. If all health checks pass, the endpoint initiates the rolling upgrade. For more information, see steps 1 and 2 in Perform a rolling upgrade.
A successful request returns an "Upgrade initiated" message. For example:
{ "updated":"2022-11-24T17:25:54+0000", "author":"Splunk", "layout":"props", "entry":[ { "title":"upgrade", "id":"/services/upgrade/shc/upgrade", "updated":"2022-11-24T17:25:54+0000", "links":{ "alternate":{ "href":"shc/upgrade" } }, "content":{ "message":"Upgrade initiated", "status":"succeeded" } } ] }
In some cases the request can fail and return an error, for example, if health checks fail or if a rolling upgrade is already running. To troubleshoot the cause of a failure, review the HTTP return codes and check log files for details. The
upgrade/shc/upgrade
endpoint returns the following HTTP status codes:Code Description 200 Upgrade operation successfully initiated. 400 Configuration error. 403 - An upgrade is already running.
- Upgrade is not required.
- The search head cluster is not ready. Wait for the cluster to fully initialize.
500 Internal Server Error. Something went wrong with the upgrade. Check log files for more information. Possible reasons:
- The upgrade could not be triggered on a given member.
501 Attempted to upgrade an unsupported deployment. (Rolling upgrade supports search head clusters, search heads and deployers only.) 503 KV store is not ready. For more troubleshooting information, including relevant log files, see Troubleshoot and recover from automated rolling upgrade failure.
For endpoint details, see upgrade/shc/upgrade in the REST API Reference Manual.
Alternatively, on any cluster member, run the
splunk rolling-upgrade shc-upgrade
command to initiate the automated rolling upgrade. - An upgrade is already running.
- Monitor the status of the rolling upgrade until all cluster members are sucessfully upgraded. To monitor the rolling upgrade status, send an HTTP GET request to the
upgrade/shc/status
endpoint. For example:curl -u admin:pass -k "https://localhost:8089/services/upgrade/shc/status?output_mode=json"
The response shows the current status of the rolling upgrade, including the upgrade status of the entire cluster, the status of each individual cluster member, and the total number and percentage of members upgraded. For example:
{ "updated":"2022-11-24T17:33:28+0000", "author":"Splunk", "layout":"props", "entry":[ { "title":"status", "id":"/services/upgrade/shc/status", "updated":"2022-11-24T17:33:28+0000", "links":{ "alternate":{ "href":"shc/status" } }, "content":{ "message":{ "upgrade_status":"completed", "statistics":{ "peers_to_upgrade":3, "overall_peers_upgraded":3, "overall_peers_upgraded_percentage":100 }, "peers":[ { "name":"sh2", "status":"upgraded", "last_modified":"Thu Nov 24 17:29:41 2022" }, { "name":"sh1", "status":"upgraded", "last_modified":"Thu Nov 24 17:28:07 2022" }, { "name":"sh3", "status":"upgraded", "last_modified":"Thu Nov 24 17:31:15 2022" } ] } } } ] }
The
upgrade/shc/status
endpoint returns the following HTTP status codes:Code Description 200 Unable to get the latest SHC status. 400 Configuration error. 500 Internal error. Check log files for more information on the error. 501 Attempted to get the status of an unsupported deployment. 503 Unable to access KV store. KV store probably still initializing. For endpoint details, see upgrade/shc/status in the REST API Reference Manual.
Alternatively, run the
splunk rolling-upgrade shc-status
command to monitor the automated rolling upgrade.If you get a "Couldn't connect to server" response, such as the following, when monitoring the rolling upgrade status:
% curl -u admin:pass -k https://10.225.218.144:8089/services/shc/status curl: (7) Failed to connect to 10.225.218.144 port 8089 after 1212 ms: Couldn't connect to server
it means that this cluster member is being restarted as a part of the upgrade process.
You can get this response when trying to monitor the status of a machine that is temporarily down because the rolling upgrade process stops, unpacks the package, and restarts splunkd. In this case, monitor the status from a different cluster member, or wait until that cluster member is up and running again.
- Upgrade the deployer. When the
upgrade/shc/status
endpoint response shows"upgrade_status":"completed"
for the entire cluster, repeat step 2 to upgrade the deployer.
Create a custom installation hook
An installation hook is a custom binary or script that installs the Splunk package on every machine. The splunk-rolling-upgrade app downloads the package specified in package_path
in rolling_upgrade.conf, then sends a request to the hook to install the package on the cluster member.
The app passes the package path to the hook as the first parameter, and $SPLUNK_HOME as the second parameter. The hook must contain installation instructions for the package, and must have executable permissions, which you can set using the chmod+x
command. For example, the following shows the default installation hook for .tgz packages:
#!/bin/bash set -e splunk_tar="$1" dest_dir="$2" tar -zvxf "$splunk_tar" --strip-components 1 -C "$dest_dir"
Custom hooks for deb and rpm package installation
Installation of deb and rpm packages requires sudo permissions, while the Splunk instance typically runs under the 'splunk' user without those privileges.
To perform an automated rolling upgrade using deb or rpm packages, create a custom installation hook. Before you run installation commands, such as sudo rpm --upgrade
for rpm packages, take these steps:
- Acquire elevated privileges for the installation hook for deb and rpm packages.
- Install the correct package manager on your machine:
-
dpkg
for deb packages -
rpm
for rpm packages.
-
Troubleshoot and recover from automated rolling upgrade failure
Using the splunk-rolling-upgrade app, you can return a search head cluster to a ready state, where you can run the automated rolling upgrade again, after a rolling upgrade failed. Before you initiate the recovery process, make sure that the rolling upgrade has failed or crashed.
When a rolling upgrade fails, you can see the following status of the "upgrade_status"
field in the upgrade/shc/status
endpoint response:
"failed"
, in most cases"in_progress"
, in some cases, for example, if the upgrade crashes while the Splunk instance is stopped.
To investigate the cause of the rolling upgrade failure, take these steps:
- Find the last instance that was upgraded at the time of failure. To do it, check the
upgrade/shc/status
endpoint response for the member whose"status"
field is set to different values than"READY"
or"UPGRADED"
. - Check the logs for errors.
The splunk-rolling-upgrade app writes to 3 log files under splunk/var/log/splunk:
- splunk_shc_upgrade_upgrader_script.log
- splunk_shc_upgrade_rest_endpoints.log
- splunk_shc_upgrade_completion_script.log
If the request response shows "no_upgrade", look for errors in the splunk_shc_upgrade_rest_endpoints.log file on the member where you ran the request. Address the issues that you find in the logs. Make sure the issues do not repeat on other cluster members during future rolling upgrade attempts.
After you address the issues that caused the failure, prepare the cluster for another rolling upgrade attempt, as follows:
- If the cluster member where the issue occurred is down, manually perform the installation of the package on that machine. Remove splunk/var/run/splunk/trigger-rolling-upgrade (if it exists), and start Splunk on that member.
- Send an HTTP POST request to the
upgrade/shc/recovery
endpoint. For example:curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/recovery"
This operation returns the cluster to the ready state, where you can run the automated rolling upgrade again after failure. It also sets the current upgrade status to "failed". Note that it can take some time for the KV store to initialize after startup.
The
upgrade/shc/recovery
endpoint returns the following HTTP status codes:Code Description 200 Recovery was executed successfully. 400 Configuration error. 500 Internal error. Check log files for more information on the error. 501 Attempted to run a recovery on an unsupported deployment. For endpoint details, see upgrade/shc/recovery in the REST API Reference Manual.
Alternatively, run the
splunk rolling-upgrade shc-recovery
command to initiate the recovery process. - If the
upgrade/shc/recovery
endpoint response contains a message such as the following:{ "message":"SHC partially recovered. Please turn off manual detention mode on the following peers: ['sh1']", "status":"succeeded" }
then send an HTTP POST request to the
/shcluster/member/control/control/set_manual_detention
endpoint, turning off manual detention on the search head specified in the response. For example:curl -u admin:pass -k "https://localhost:8089/servicesNS/admin/search/shcluster/member/control/control/set_manual_detention -d manual_detention=off"
For endpoint details, see shcluster/member/control/control/set_manual_detention in the REST API Reference Manual.
- Resume the upgrade by sending an HTTP POST request to the
upgrade/shc/upgrade
endpoint. For example:curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
For details on how to run the automated rolling upgrade, see Run the automated rolling upgrade.
Perform a rolling upgrade of a search head cluster | Configure the search head cluster |
This documentation applies to the following versions of Splunk® Enterprise: 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.2.0, 9.2.1, 9.2.2
Feedback submitted, thanks!