Perform an automated rolling upgrade of an indexer cluster
Splunk Enterprise version 9.3.0 and higher supports the following upgrades using the default splunk-rolling-upgrade app:
- Automated rolling upgrade of an indexer cluster
- Upgrade of a non-clustered indexer
- Upgrade of a cluster manager (CM)
The splunk-rolling-upgrade app comes with the Splunk Enterprise product. A rolling upgrade performs a phased upgrade of cluster peers to a new version of Splunk Enterprise with minimal interruption of ongoing searches and data ingestion. The splunk-rolling-upgrade app automates the manual rolling upgrade steps described in Perform a rolling upgrade of an indexer cluster.
Requirements and considerations
General requirements
Review the following requirements and considerations before you configure and initiate an automated rolling upgrade:
- The splunk-rolling-upgrade app requires Linux/Unix OS. Mac OS and Windows are not supported.
- Automated rolling upgrade applies only to upgrades from version 9.3.x and higher to subsequent versions of Splunk Enterprise. To determine your upgrade path and confirm the compatibility of the upgraded CM and cluster peers with existing Splunk Enterprise components and applications, see Splunk products version compatibility matrix.
- Automated rolling upgrade supports the following installation package formats:
- .tgz - the default file format
- .deb and .rpm - these file formats require a custom script that can run with elevated privileges. See Create a custom installation hook.
- To use the splunk-rolling-upgrade app, you must hold a splunk_system_upgrader role.
- To use the splunk-rolling-upgrade app with Splunk Enterprise instances that are managed by systemd, you need to be able to run a custom control script with elevated privileges. See Create a custom control hook.
Additional Requirements for Clustered Environments
For Splunk clustered deployments where there is at least one cluster manager that handles indexers, the following requirements apply:
- An automated CM upgrade requires turning off the CM redundancy feature. Otherwise, you must manually upgrade CM nodes. You can still use the app to perform automated upgrades of cluster peers later. To learn about redundancy, see Implement cluster manager redundancy.
- For multisite deployments, the automated rolling upgrade app upgrades site by site automatically. After upgrading all the indexers of a site, the app starts upgrading indexers in the next site.
How an automated rolling upgrade works
To ensure a successful automated rolling upgrade, you must upgrade your Splunk deployment in the following order:
Changing the order may cause issues with your Splunk deployment due to version incompatibility.
- Upgrade the license manager (LM).
- Upgrade the cluster manager (CM).
- Upgrade the search head tier.
- Upgrade the indexer tier.
Upgrade the license manager (LM)
The LM role can be colocated on an instance that is performing other tasks. To learn about instances where you can colocate the LM, see Choose the instance to serve as the license manager. To upgrade the LM, identify the instance that serves as the LM. Depending on the instance, follow one of the upgrade workflows:
- If the LM is colocated on other instances than search heads and cluster managers, follow these steps:
- Configure the app by taking the steps for non-clustered deployments. To view the steps, see Configure the rolling upgrade app for non-clustered deployments.
- Run the upgrade using the steps for non-clustered deployments. To view the steps, see Run the automated rolling upgrade app for non-clustered deployments.
- If the LM is colocated on a non-clustered search head, upgrade the LM as a first instance by following the steps for non-clustered deployments. To view the steps, see Run the rolling upgrade app for non-clustered deployments.
- In other cases, upgrading the LM is not required. It is upgraded automatically when upgrading a search head cluster (SHC) or CM.
Upgrade the cluster manager (CM)
The splunk-rolling-upgrade app provides the functionality to upgrade a CM. To initiate the upgrade, send a single request to a REST endpoint or specify the corresponding CLI command. For REST endpoints and CLI commands, refer to the table in this section. Next, the app stops the CM, downloads a new Splunk Enterprise install package, installs it, and starts the CM.
You must upgrade each CM separately.
By default, the app supports only .tgz packages. The app unpacks their content to the $SPLUNK_HOME directory, which is typically located in /opt/splunk. To learn how to customize the installation step by using custom hooks, for example, shell scripts, see Custom hooks for deb and rpm package installation.
To upgrade a CM, the splunk-rolling-upgrade app provides the following REST endpoints and corresponding CLI commands:
If the CM redundancy feature is turned on, upgrade and backup the CM manually. Don't use the commands and endpoints from this table.
REST endpoint | CLI command | Description |
---|---|---|
upgrade/cluster/manager
|
splunk rolling-upgrade cluster-manager
|
Initiate the upgrade process. |
upgrade/cluster/status
|
splunk rolling-upgrade cluster-manager
|
Monitor the automated upgrade status. This endpoint displays the statuses of CM and cluster peers upgrades.
The status endpoint is not available while CM is down for the upgrade. |
Upgrade the search head tier
The automated rolling upgrade of indexers does not upgrade an SHC. If needed, upgrade the SHC manually. To learn about upgrading the SHC, see Perform an automated rolling upgrade of a search head cluster.
Upgrade the indexer tier
Upgrade the indexer cluster
To initiate the upgrade of the indexer cluster, you can send a request to the REST endpoint or specify the corresponding CLI command on the cluster manager. The action starts an orchestrator process that performs the upgrade of indexer cluster peers. The orchestrator process downloads and installs a new Splunk package on all indexer peers while maintaining data searchability on all buckets.
To achieve it, the orchestrator process makes sure that the number of indexer peers that are undergoing the upgrade at one point in time does not exceed min((search_factor - 1), (cluster_size - 1)/2))
, where cluster_size
is the total number of peers in the cluster. For example, assuming search_factor = 3
, and the indexer tier includes 10 indexers, the automated rolling upgrade app upgrades 2 indexers in parallel.
Based on this formula, if search_factor == 1
or a number of peers in a cluster is <= 2
, you can't perform the automated rolling upgrade.
By default, the app supports only .tgz packages. The app unpacks their content to the $SPLUNK_HOME directory on cluster peers, which is typically located in /opt/splunk. To learn how to customize the installation step in the same way as for a CM, by using custom hooks, for example, shell scripts, see Create a custom installation hook.
REST endpoint | CLI command | Description |
---|---|---|
upgrade/cluster/all_peers
|
splunk rolling-upgrade cluster-all-peers
|
Initiate the automated rolling upgrade process for cluster peers. The endpoint supports the "force" parameter that allows you to skip a health check before performing an upgrade. Example: https://localhost:8089/services/upgrade/cluster/all_peers?force=true |
upgrade/cluster/status
|
splunk rolling-upgrade cluster-status
|
Monitor the automated rolling upgrade status. Call this endpoint on the CM to display the upgrade status of the CM and cluster peers. |
upgrade/cluster/recovery
|
splunk rolling-upgrade cluster-recovery
|
Return the cluster to a ready state after the automated rolling upgrade fails. |
Upgrade non-clustered indexers
If the indexer tier consists of one or several non-clustered indexer instances, the splunk-rolling-upgrade app provides only partial automation functionality. As there is no CM, there is no central instance from which you can orchestrate the upgrade. To initiate the upgrade, you can send a request to the REST endpoint or specify the corresponding CLI command on every indexer instance separately. The upgrade/cluster/status
endpoint returns only the upgrade status of the single instance on which it is called.
REST endpoint | CLI command | Description |
---|---|---|
upgrade/standalone
|
splunk rolling-upgrade standalone
|
Initiate the upgrade process for a single indexer in a non-clustered deployment.. |
upgrade/cluster/status
|
splunk rolling-upgrade cluster-status
|
Monitor the upgrade status of a single instance. |
Perform an automated rolling upgrade
This section shows you how to configure and use the splunk-rolling-upgrade app to run an automated rolling upgrade.
Configure the rolling upgrade app for clustered deployments
Before you can run an automated rolling upgrade, create and configure the splunk-rolling-upgrade app for indexer upgrades and distribute it to indexer peers. To do so, take the following steps:
The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.
- On the CM, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new
rolling_upgrade.conf
file containing the following:[downloader] package_path = <path to a package> md5_checksum = <md5 checksum of a package>
Where:
-
package_path
is a URI to the location of a new installation package. For the specification file, refer to $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade. -
md5_checksum
contains md5 checksum of that package in the hexadecimal format.
-
- (Optional) If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
- Create a custom installation hook.
The installation hook is a script that contains installation instructions for the specific package type. To learn about creating the hook, see Create a custom installation hook.
- Run the
chmod +x
command to set execution permissions for the associated hook (script) that you wrote. - Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
- Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the
hook
stanza, set the install_script_path value to the location of the hook, for example:
[hook] install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
- Create a custom installation hook.
- If you run Splunk Enterprise as a systemd service, perform an automated rolling upgrade by following these steps:
- Run the
chmod +x
command to set execution permissions for the associated hook, that is the script that you wrote. - Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory if it doesn't already exist.
- Copy the hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the
[hook]
stanza, set thecontrol_script_path
value to the location of the hook. For example:[hook] control_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
Provide your own custom commands to stop, start, and offline a Splunk Enterprise instance run as a systemd service.
- Run the
- On the CM, to create a configuration for cluster peers, copy $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config into $SPLUNK_HOME/etc/manager-apps directory.
If cluster peers need to use different paths, update $SPLUNK_HOME/etc/manager-apps/default/rolling_upgrade.conf.
On peers, after a bundle push, the splunk-rolling-upgrade-config app appears in $SPLUNK_HOME/etc/peer-apps directory. Make sure that you can access package_path and install_script_path on peers by specifying, for example, this path:
[hook] install_script_path = $SPLUNK_HOME/etc/peer-apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
- Validate and push a bundle by specifying the following CLI commands:
splunk validate cluster-bundle splunk apply cluster-bundle
For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.
Configure the rolling upgrade app for non-clustered deployments
To configure each standalone indexer or LM, follow these steps:
The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.
- Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
- In the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory, create a new rolling_upgrade.conf file containing the following:
[downloader] package_path = <path to a package> md5_checksum = <md5 checksum of a package>
Where:
-
package_path
is a URI to the location of a new installation package. For the specification file, refer to $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade. -
md5_checksum
contains md5 checksum of that package in the hexadecimal format.
-
- (Optional) If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
- Create a custom installation hook.
The installation hook is a script that contains installation instructions for the specific package type. To learn about creating the hook, see Create a custom installation hook.
- Run the
chmod +x
command to set execution permissions for the associated hook (script) that you wrote. - Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
- Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the
hook
stanza, set the install_script_path value to the location of the hook, for example:
[hook] install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
- Create a custom installation hook.
- If you run Splunk Enterprise as a systemd service, perform an automated rolling upgrade by following these steps:
- Run the
chmod +x
command to set execution permissions for the associated hook, that is the script that you wrote. - Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory if it doesn't already exist.
- Copy the hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
- In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the
[hook]
stanza, set thecontrol_script_path
value to the location of the hook. For example:[hook] control_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
Provide your own custom commands to stop, start, and offline a Splunk Enterprise instance run as a systemd service.
- Run the
- Repeat these steps on each standalone indexer or LM.
Run the automated rolling upgrade for clustered deployments
After you configure the splunk-rolling-upgrade app, follow these steps to run the automated rolling upgrade of your indexer cluster. You can use the REST API or specify the corresponding CLI commands.
- Identify the URI and management port of the CM.
- To initiate the CM upgrade process, on any CM, send an HTTP POST request to the
upgrade/cluster/manager
endpoint. For example:curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/manager?output_mode=json"
- Monitor the upgrade status by sending an HTTP GET request to the
upgrade/cluster/status
endpoint. For example:curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
- Wait until the status response shows that the CM is upgraded successfully.
- To initiate the upgrade of all the clustered indexers, on the CM, send an HTTP POST request to the
upgrade/cluster/all_peers
. For example:curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/all_peers?output_mode=json"
- Keep monitoring the upgrade using this request:
curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
Run the automated rolling upgrade for non-clustered deployments
- Identify the URI and management port of any standalone indexer or LM.
- To initiate the upgrade process, send an HTTP POST request to the
upgrade/standalone
endpoint. For example:curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/standalone?output_mode=json"
- Monitor the upgrade status by sending an HTTP GET request to the
upgrade/cluster/status
endpoint. For example:curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
Create a custom installation hook
To learn how to create an installation hook, see Create a custom installation hook.
Create a custom control hook
A control hook is a custom binary or script used to perform custom start
, stop
and offline
commands on a Splunk Enterprise instance on each device where Splunk Enterprise is upgraded. The splunk-rolling-upgrade app uses the control hook to stop the Splunk Enterprise instance before and start it after upgrading the package.
The splunk-rolling-upgrade app passes the following data in this order:
- Path to the splunk binary file, for example $SPLUNK_HOME/bin/splunk
The splunk-rolling-upgrade app uses this path to call the commands.
- One of the commands:
stop
,start
, oroffline
token
if the app passes theoffline
command.
Make sure the control hook includes the following:
- Instructions how to stop, start, and offline a Splunk Enterprise instance
- Executable permissions that you can set using the
chmod+x
command.
Example of a default control hook#!/bin/bash set -e SPLUNK_PATH="$1" COMMAND="$2" if [ "$COMMAND" = "start" ]; then "$SPLUNK_PATH" start --accept-license --answer-yes elif [ "$COMMAND" = "offline" ]; then TOKEN="$3" "$SPLUNK_PATH" offline -token "$TOKEN" elif [ "$COMMAND" = "stop" ]; then "$SPLUNK_PATH" stop else echo "Invalid command" exit 1 fi
Use custom control hooks to ugrade systemd-managed Splunk Enterprise
On a Splunk Enterprise instance that is managed by systemd, you can perform the automated rolling upgrade using one of the following ways:
To continue, acquire elevated privileges for the control hook script. They are required to modify files of the systemd service and to stop and start Splunkd.service
that runs under systemd. Typically, the Splunk Enterprise instance runs under the splunk
user without these privileges.
- By taking the following steps:
- In /etc/systemd/system/Splunkd.service unit file, change the value of the
KillMode
setting toprocess
.To stop a Splunk Enterprise instance, by default, the
Splunkd.service
process uses theKillMode=mixed
setting to kill all children processes. However, it also kills one of the scripts run by the splunk-rolling-upgrade app that is used to stop and start a Splunk Enterprise instance and to perform the upgrade. A temporary modification of theKillMode
value prevents killing that script. - Reload the systemd deamon.
- Perform an automated rolling upgrade. See Perform an automated rolling upgrade.
- In /etc/systemd/system/Splunkd.service unit file, set
KillMode
to themixed
value. - Reload systemd deamon.
- In /etc/systemd/system/Splunkd.service unit file, change the value of the
- Automatically, by using a control hook script.
Example of a control hook that updates theKillMode
:#!/bin/bash set -e SPLUNK_PATH="$1" COMMAND="$2" SPLUNK_SYSTEMD_DIR="/etc/systemd/system/Splunkd.service.d" cleanup_if_exists() { if [ -d "$SPLUNK_SYSTEMD_DIR" ]; then sudo rm -rf "$SPLUNK_SYSTEMD_DIR" && sudo systemctl daemon-reload fi } handle_error() { cleanup_if_exists echo "An error occurred. splunk_control.sh exiting with status: $1." exit "$1" } override_kill_mode() { sudo mkdir "$SPLUNK_SYSTEMD_DIR" || handle_error "$?" (sudo tee "$SPLUNK_SYSTEMD_DIR/override.conf" <<EOF [Service] KillMode=process EOF ) || handle_error "$?" sudo systemctl daemon-reload || handle_error "$?" } if [ "$COMMAND" = "start" ]; then cleanup_if_exists sudo "$SPLUNK_PATH" start --accept-license --answer-yes elif [ "$COMMAND" = "offline" ]; then override_kill_mode TOKEN="$3" "$SPLUNK_PATH" offline -token "$TOKEN" cleanup_if_exists elif [ "$COMMAND" = "stop" ]; then override_kill_mode sudo "$SPLUNK_PATH" stop cleanup_if_exists else echo "Invalid command" fi
Troubleshoot and recover from automated rolling upgrade failure
To track the automated rolling upgrade status, check the response from the upgrade/cluster/status
endpoint.
Typically, the response in a json file from the CM includes the following:
{ "message":{ "current_instance_upgrade":{ "upgrader_pid":6025, "from_version":"9.3.0", "to_version":"9.4.0", "last_modified":"Wed Jun 07 12:20:24 2023", "status":"completed" }, "peers_upgrade":{ "orchestrator_pid":6865, "from_version":"9.3.0", "to_version":"9.4.0", "overall_status":"in_progress", "peers":[ { "name":"idx3", "last_modified":"Wed Jun 07 12:24:16 2023", "status":"completed", "upgrader_pid": 1, }, ... ], "statistics":{ "peers_to_upgrade":4, "overall_peers_upgraded":2, "overall_peers_upgraded_percentage":50 } } } }
The response consists of two sections:
"current_instance_upgrade"
refers to the status of the CM upgrade. If the CM upgrade fails, you can see a"failed"
value in the"status"
field."peers_upgrade"
tracks the upgrade status of indexer cluster peers. If the indexer cluster upgrade fails, typically, you can see both these values:"failed"
value in the"overall_status"
field for the"peers_upgrade"
section"failed"**
value in the"status"
field for one of the peers from the"peers"
list
To learn why the upgrade failed, check the logs on the CM and on the indexer(s) where the upgrade has failed. To find logs related to an indexer cluster upgrade in the splunk-rolling-upgrade app, check the following 2 log files under splunk/var/log/splunk:
- splunk_idxc_upgrade_upgrader_script.log
- splunk_idxc_upgrade_rest_endpoints.log
Resolve issues resulting from the logs
After you detect an upgrade failure, log in to the environment of the instance where the upgrade has failed and check the logs to identify and resolve the issue.
If the cluster member where the issue occurred is down, manually install the package on that machine, and start Splunk Enterprise on that member.
- Send an HTTP POST request to the
upgrade/cluster/recovery
endpoint on the CM. For example:curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/recovery"
- If the previous upgrade process is still running, for example, it has become unresponsive, the endpoint indicates the identifier of the process (PID). Before you retry the recovery, stop the process.
- If the previous upgrade process is complete, the
upgrade/cluster/recovery
operation tries to return the cluster to the ready state, where you can run the automated rolling upgrade again after it failed. If the previous upgrade crashed and the status is stuck in the"in_progress"
state, the operation sets the current upgrade status to"failed"
. If the status is"failed"
, proceed to the next step. Alternatively, you can initiate the recovery process by running the Splunkrolling-upgrade cluster-recovery
CLI command.
- Resume the upgrade by sending an HTTP POST request to the
upgrade/cluster/all_peers
endpoint on the CM. For example:curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/all_peers?output_mode=json"
In rare cases, where an indexer peer fails during the upgrade, the cluster may no longer meet the replication factor. It causes the
upgrade/cluster/all_peers
endpoint to stop the upgrade due to a failed health check. You can skip the health check by specifying the REST argumentforce=true
:curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/all_peers?force=true&output_mode=json"
In this case, some data on the cluster may not be searchable during the upgrade.
Perform a rolling upgrade of an indexer cluster | Ways to get data into an indexer cluster |
This documentation applies to the following versions of Splunk® Enterprise: 9.4.0
Feedback submitted, thanks!