Perform an automated rolling upgrade of an indexer cluster

Splunk Enterprise version 9.3.0 and higher supports the following upgrades using the default splunk-rolling-upgrade app:

Automated rolling upgrade of an indexer cluster
Upgrade of a non-clustered indexer
Upgrade of a cluster manager (CM)

The splunk-rolling-upgrade app comes with the Splunk Enterprise product. A rolling upgrade performs a phased upgrade of cluster peers to a new version of Splunk Enterprise with minimal interruption of ongoing searches and data ingestion. The splunk-rolling-upgrade app automates the manual rolling upgrade steps described in Perform a rolling upgrade of an indexer cluster.

Requirements and considerations

General requirements

Review the following requirements and considerations before you configure and initiate an automated rolling upgrade:

Automated upgrade of Splunk Enterprise instances running under systemd is not supported.

The splunk-rolling-upgrade app requires Linux/Unix OS. Mac OS and Windows are not supported.
Automated rolling upgrade applies only to upgrades from version 9.3.x and higher to subsequent versions of Splunk Enterprise. To determine your upgrade path and confirm the compatibility of the upgraded CM and cluster peers with existing Splunk Enterprise components and applications, see Splunk products version compatibility matrix.
Automated rolling upgrade supports the following installation package formats:
- .tgz - the default file format
- .deb and .rpm - these file formats require a custom script that can run with elevated privileges. See Create a custom installation hook.
To use the splunk-rolling-upgrade app, you must hold a splunk_system_upgrader role.

Additional Requirements for Clustered Environments

For Splunk clustered deployments where there is at least one cluster manager that handles indexers, the following requirements apply:

An automated CM upgrade requires turning off the CM redundancy feature. Otherwise, you must manually upgrade CM nodes. You can still use the app to perform automated upgrades of cluster peers later. To learn about redundancy, see Implement cluster manager redundancy.
For multisite deployments, the automated rolling upgrade app upgrades site by site automatically. After upgrading all the indexers of a site, the app starts upgrading indexers in the next site.

How an automated rolling upgrade works

To ensure a successful automated rolling upgrade, you must upgrade your Splunk deployment in the following order:

Changing the order may cause issues with your Splunk deployment due to version incompatibility.

Upgrade the license manager (LM).
Upgrade the cluster manager (CM).
Upgrade the search head tier.
Upgrade the indexer tier.

Upgrade the license manager (LM)

The LM role can be colocated on an instance that is performing other tasks. To learn about instances where you can colocate the LM, see Choose the instance to serve as the license manager. To upgrade the LM, identify the instance that serves as the LM. Depending on the instance, follow one of the upgrade workflows:

If the LM is colocated on other instances than search heads and cluster managers, follow these steps:

Configure the app by taking the steps for non-clustered deployments. To view the steps, see Configure the rolling upgrade app for non-clustered deployments.
Run the upgrade using the steps for non-clustered deployments. To view the steps, see Run the automated rolling upgrade app for non-clustered deployments.

If the LM is colocated on a non-clustered search head, upgrade the LM as a first instance by following the steps for non-clustered deployments. To view the steps, see Run the rolling upgrade app for non-clustered deployments.

In other cases, upgrading the LM is not required. It is upgraded automatically when upgrading a search head cluster (SHC) or CM.

Upgrade the cluster manager (CM)

The splunk-rolling-upgrade app provides the functionality to upgrade a CM. To initiate the upgrade, send a single request to a REST endpoint or specify the corresponding CLI command. For REST endpoints and CLI commands, refer to the table in this section. Next, the app stops the CM, downloads a new Splunk Enterprise install package, installs it, and starts the CM.

You must upgrade each CM separately.

By default, the app supports only .tgz packages. The app unpacks their content to the $SPLUNK_HOME directory, which is typically located in /opt/splunk. To learn how to customize the installation step by using custom hooks, for example, shell scripts, see Custom hooks for deb and rpm package installation.

To upgrade a CM, the splunk-rolling-upgrade app provides the following REST endpoints and corresponding CLI commands:

If the CM redundancy feature is turned on, upgrade and backup the CM manually. Don't use the commands and endpoints from this table.

REST endpoint	CLI command	Description
`upgrade/cluster/manager`	`splunk rolling-upgrade cluster-manager`	Initiate the upgrade process.
`upgrade/cluster/status`	`splunk rolling-upgrade cluster-manager`	Monitor the automated upgrade status. This endpoint displays the statuses of CM and cluster peers upgrades. The status endpoint is not available while CM is down for the upgrade.

Upgrade the search head tier

The automated rolling upgrade of indexers does not upgrade an SHC. If needed, upgrade the SHC manually. To learn about upgrading the SHC, see Perform an automated rolling upgrade of a search head cluster.

Upgrade the indexer tier

Upgrade the indexer cluster

To initiate the upgrade of the indexer cluster, you can send a request to the REST endpoint or specify the corresponding CLI command on the cluster manager. The action starts an orchestrator process that performs the upgrade of indexer cluster peers. The orchestrator process downloads and installs a new Splunk package on all indexer peers while maintaining data searchability on all buckets. To achieve it, the orchestrator process makes sure that the number of indexer peers that are undergoing the upgrade at one point in time does not exceed min((search_factor - 1), (cluster_size - 1)/2)) , where cluster_size is the total number of peers in the cluster. For example, assuming search_factor = 3 , and the indexer tier includes 10 indexers, the automated rolling upgrade app upgrades 2 indexers in parallel.

Based on this formula, if search_factor == 1 or a number of peers in a cluster is <= 2, you can't perform the automated rolling upgrade.

By default, the app supports only .tgz packages. The app unpacks their content to the $SPLUNK_HOME directory on cluster peers, which is typically located in /opt/splunk. To learn how to customize the installation step in the same way as for a CM, by using custom hooks, for example, shell scripts, see Create a custom installation hook.

REST endpoint	CLI command	Description
`upgrade/cluster/all_peers`	`splunk rolling-upgrade cluster-all-peers`	Initiate the automated rolling upgrade process for cluster peers. The endpoint supports the `"force"` parameter that allows you to skip a health check before performing an upgrade. Example: https://localhost:8089/services/upgrade/cluster/all_peers?force=true To learn about troubleshooting and recovery, see Troubleshoot and recover from automated rolling upgrade failure.
`upgrade/cluster/status`	`splunk rolling-upgrade cluster-status`	Monitor the automated rolling upgrade status. Call this endpoint on the CM to display the upgrade status of the CM and cluster peers.
`upgrade/cluster/recovery`	`splunk rolling-upgrade cluster-recovery`	Return the cluster to a ready state after the automated rolling upgrade fails.

Upgrade non-clustered indexers

If the indexer tier consists of one or several non-clustered indexer instances, the splunk-rolling-upgrade app provides only partial automation functionality. As there is no CM, there is no central instance from which you can orchestrate the upgrade. To initiate the upgrade, you can send a request to the REST endpoint or specify the corresponding CLI command on every indexer instance separately. The upgrade/cluster/status endpoint returns only the upgrade status of the single instance on which it is called.

REST endpoint	CLI command	Description
`upgrade/standalone`	`splunk rolling-upgrade standalone`	Initiate the upgrade process for a single indexer in a non-clustered deployment..
`upgrade/cluster/status`	`splunk rolling-upgrade cluster-status`	Monitor the upgrade status of a single instance.

Perform an automated rolling upgrade

This section shows you how to configure and use the splunk-rolling-upgrade app to run an automated rolling upgrade.

Configure the rolling upgrade app for clustered deployments

Before you can run an automated rolling upgrade, create and configure the splunk-rolling-upgrade app for indexer upgrades and distribute it to indexer peers. To do so, take the following steps:

The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.

On the CM, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new rolling_upgrade.conf file containing the following:
```
[downloader]
package_path = <path to a package>
md5_checksum = <md5 checksum of a package>
```
Where:
- package_path is an absolute path to a location of the new installation package. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade. The package_path setting supports the following URI paths:
  - Paths to local files, for example, file:///path/to/package.tgz
    Note that this example contains 3 slash characters where 2 slashes represent a protocol and 1 an absolute path.
  - Remote links that require no authentication, for example, http://<...>
- md5_checksum contains md5 checksum of that package in the hexadecimal format.
If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
1. Create a custom installation hook.
  The installation hook is a script that contains installation instructions for the specific package type. To learn about creating the hook, see Create a custom installation hook.
2. Run the chmod +x command to set execution permissions for the associated hook (script) that you wrote.
3. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
4. Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
5. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling_upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook, for example:
On the CM, to create a configuration for cluster peers, copy $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config into $SPLUNK_HOME/etc/manager-apps directory.
If cluster peers need to use different paths, update $SPLUNK_HOME/etc/manager-apps/default/rolling_upgrade.conf.

On peers, after a bundle push, the splunk-rolling-upgrade-config app appears in $SPLUNK_HOME/etc/peer-apps directory. Make sure that you can access package_path and install_script_path on peers by specifying, for example, this path:
```
[hook]
install_script_path = $SPLUNK_HOME/etc/peer-apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
```
Validate and push a bundle by specifying the following CLI commands:
```
splunk validate cluster-bundle
splunk apply cluster-bundle
```

For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.

Configure the rolling upgrade app for non-clustered deployments

To configure each standalone indexer or LM, follow these steps:

The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.

Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
In the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory, create a new rolling_upgrade.conf file containing the following:
```
[downloader]
package_path = <path to a package>
md5_checksum = <md5 checksum of a package>
```
Where:
- package_path is an absolute path to a location of the new installation package. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade. The package_path setting supports the following URI paths:
  - Paths to local files, for example, file:///path/to/package.tgz
    Note that this example contains 3 slash characters where 2 slashes represent a protocol and 1 an absolute path.
  - Remote links that require no authentication, for example, http://<...>
- md5_checksum contains md5 checksum of that package in the hexadecimal format.
If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
1. Create a custom installation hook.
  The installation hook is a script that contains installation instructions for the specific package type. To learn about creating the hook, see Create a custom installation hook.
2. Run the chmod +x command to set execution permissions for the associated hook (script) that you wrote.
3. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
4. Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
5. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling_upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook, for example:
Repeat these steps on each standalone indexer or LM.

Run the automated rolling upgrade for clustered deployments

After you configure the splunk-rolling-upgrade app, follow these steps to run the automated rolling upgrade of your indexer cluster. You can use the REST API or specify the corresponding CLI commands.

Identify the URI and management port of the CM.
To initiate the CM upgrade process, on any CM, send an HTTP POST request to the upgrade/cluster/manager endpoint. For example:
```
curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/manager?output_mode=json"
```
Monitor the upgrade status by sending an HTTP GET request to the upgrade/cluster/status endpoint. For example:
```
curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
```
Wait until the status response shows that the CM is upgraded successfully.
To initiate the upgrade of all the clustered indexers, on the CM, send an HTTP POST request to the upgrade/cluster/all_peers. For example:
```
curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/all_peers?output_mode=json"
```

Keep monitoring the upgrade using this request:

curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"

Run the automated rolling upgrade for non-clustered deployments

Identify the URI and management port of any standalone indexer or LM.
To initiate the upgrade process, send an HTTP POST request to the upgrade/standalone endpoint. For example:
```
curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/standalone?output_mode=json"
```
Monitor the upgrade status by sending an HTTP GET request to the upgrade/cluster/status endpoint. For example:
```
curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
```

Create a custom installation hook

To learn how to create an installation hook, see Create a custom installation hook.

Troubleshoot and recover from automated rolling upgrade failure

To track the automated rolling upgrade status, check the response from the upgrade/cluster/status endpoint. Typically, the response in a json file from the CM includes the following:

{
   "message":{
      "current_instance_upgrade":{
         "upgrader_pid":6025,
         "from_version":"9.3.0",
         "to_version":"9.4.0",
         "last_modified":"Wed Jun 07 12:20:24 2023",
         "status":"completed"
      },
      "peers_upgrade":{
         "orchestrator_pid":6865,
         "from_version":"9.3.0",
         "to_version":"9.4.0",
         "overall_status":"in_progress",
         "peers":[
            {
               "name":"idx3",
               "last_modified":"Wed Jun 07 12:24:16 2023",
               "status":"completed",
               "upgrader_pid": 1,
            },
            ...
         ],
         "statistics":{
            "peers_to_upgrade":4,
            "overall_peers_upgraded":2,
            "overall_peers_upgraded_percentage":50
         }
      }
   }
}

The response consists of two sections:

"current_instance_upgrade" refers to the status of the CM upgrade. If the CM upgrade fails, you can see a "failed" value in the "status" field.
"peers_upgrade" tracks the upgrade status of indexer cluster peers. If the indexer cluster upgrade fails, typically, you can see both these values:
- "failed" value in the "overall_status" field for the "peers_upgrade" section
- "failed"** value in the "status" field for one of the peers from the "peers" list

To learn why the upgrade failed, check the logs on the CM and on the indexer(s) where the upgrade has failed. To find logs related to an indexer cluster upgrade in the splunk-rolling-upgrade app, check the following 2 log files under splunk/var/log/splunk:

splunk_idxc_upgrade_upgrader_script.log
splunk_idxc_upgrade_rest_endpoints.log

Resolve issues resulting from the logs

After you detect an upgrade failure, log in to the environment of the instance where the upgrade has failed and check the logs to identify and resolve the issue.

If the cluster member where the issue occurred is down, manually install the package on that machine, and start Splunk Enterprise on that member.

Send an HTTP POST request to the upgrade/cluster/recovery endpoint on the CM. For example:
```
curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/recovery"
```
- If the previous upgrade process is still running, for example, it has become unresponsive, the endpoint indicates the identifier of the process (PID). Before you retry the recovery, stop the process.
- If the previous upgrade process is complete, the upgrade/cluster/recovery operation tries to return the cluster to the ready state, where you can run the automated rolling upgrade again after it failed. If the previous upgrade crashed and the status is stuck in the "in_progress" state, the operation sets the current upgrade status to "failed". If the status is "failed", proceed to the next step. Alternatively, you can initiate the recovery process by running the Splunk rolling-upgrade cluster-recovery CLI command.
Resume the upgrade by sending an HTTP POST request to the upgrade/cluster/all_peers endpoint on the CM. For example:
```
curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/all_peers?output_mode=json"
```
In rare cases, where an indexer peer fails during the upgrade, the cluster may no longer meet the replication factor. It causes the upgrade/cluster/all_peers endpoint to stop the upgrade due to a failed health check. You can skip the health check by specifying the REST argument force=true:
```
curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/all_peers?force=true&output_mode=json" 
```
In this case, some data on the cluster may not be searchable during the upgrade.

Related answers from Splunk Community

Perform an automated rolling upgrade of an indexer cluster

Requirements and considerations

General requirements

Additional Requirements for Clustered Environments

How an automated rolling upgrade works

Upgrade the license manager (LM)

Upgrade the cluster manager (CM)

Upgrade the search head tier

Upgrade the indexer tier

Upgrade the indexer cluster

Upgrade non-clustered indexers

Perform an automated rolling upgrade

Configure the rolling upgrade app for clustered deployments

Configure the rolling upgrade app for non-clustered deployments

Run the automated rolling upgrade for clustered deployments

Run the automated rolling upgrade for non-clustered deployments

Create a custom installation hook

Troubleshoot and recover from automated rolling upgrade failure

Resolve issues resulting from the logs

Comments

Perform an automated rolling upgrade of an indexer cluster

Was this topic useful?