Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Perform an automated rolling upgrade of an indexer cluster

Splunk Enterprise version 9.3.0 and higher supports the following upgrades using the default splunk-rolling-upgrade app:

  • Automated rolling upgrade of an indexer cluster
  • Upgrade of a non-clustered indexer
  • Upgrade of a cluster manager (CM)

The splunk-rolling-upgrade app comes with the Splunk Enterprise product. A rolling upgrade performs a phased upgrade of cluster peers to a new version of Splunk Enterprise with minimal interruption of ongoing searches and data ingestion. The splunk-rolling-upgrade app automates the manual rolling upgrade steps described in Perform a rolling upgrade of an indexer cluster.

Requirements and considerations

General requirements

Review the following requirements and considerations before you configure and initiate an automated rolling upgrade:

  • The splunk-rolling-upgrade app requires Linux/Unix OS. Mac OS and Windows are not supported.
  • Automated rolling upgrade applies only to upgrades from version 9.3.x and higher to subsequent versions of Splunk Enterprise. To determine your upgrade path and confirm the compatibility of the upgraded CM and cluster peers with existing Splunk Enterprise components and applications, see Splunk products version compatibility matrix.
  • Automated rolling upgrade supports the following installation package formats:
    • .tgz - the default file format
    • .deb and .rpm - these file formats require a custom script that can run with elevated privileges. See Create a custom installation hook.
  • To use the splunk-rolling-upgrade app, you must hold a splunk_system_upgrader role.
  • To use the splunk-rolling-upgrade app with Splunk Enterprise instances that are managed by systemd, you need to be able to run a custom control script with elevated privileges. See Create a custom control hook.

Additional Requirements for Clustered Environments

For Splunk clustered deployments where there is at least one cluster manager that handles indexers, the following requirements apply:

  • An automated CM upgrade requires turning off the CM redundancy feature. Otherwise, you must manually upgrade CM nodes. You can still use the app to perform automated upgrades of cluster peers later. To learn about redundancy, see Implement cluster manager redundancy.
  • For multisite deployments, the automated rolling upgrade app upgrades site by site automatically. After upgrading all the indexers of a site, the app starts upgrading indexers in the next site.

How an automated rolling upgrade works

To ensure a successful automated rolling upgrade, you must upgrade your Splunk deployment in the following order:

Changing the order may cause issues with your Splunk deployment due to version incompatibility.

  1. Upgrade the license manager (LM).
  2. Upgrade the cluster manager (CM).
  3. Upgrade the search head tier.
  4. Upgrade the indexer tier.

Upgrade the license manager (LM)

The LM role can be colocated on an instance that is performing other tasks. To learn about instances where you can colocate the LM, see Choose the instance to serve as the license manager. To upgrade the LM, identify the instance that serves as the LM. Depending on the instance, follow one of the upgrade workflows:

  • If the LM is colocated on other instances than search heads and cluster managers, follow these steps:
  1. Configure the app by taking the steps for non-clustered deployments. To view the steps, see Configure the rolling upgrade app for non-clustered deployments.
  2. Run the upgrade using the steps for non-clustered deployments. To view the steps, see Run the automated rolling upgrade app for non-clustered deployments.
  • In other cases, upgrading the LM is not required. It is upgraded automatically when upgrading a search head cluster (SHC) or CM.

Upgrade the cluster manager (CM)

The splunk-rolling-upgrade app provides the functionality to upgrade a CM. To initiate the upgrade, send a single request to a REST endpoint or specify the corresponding CLI command. For REST endpoints and CLI commands, refer to the table in this section. Next, the app stops the CM, downloads a new Splunk Enterprise install package, installs it, and starts the CM.

You must upgrade each CM separately.

By default, the app supports only .tgz packages. The app unpacks their content to the $SPLUNK_HOME directory, which is typically located in /opt/splunk. To learn how to customize the installation step by using custom hooks, for example, shell scripts, see Custom hooks for deb and rpm package installation.

To upgrade a CM, the splunk-rolling-upgrade app provides the following REST endpoints and corresponding CLI commands:

If the CM redundancy feature is turned on, upgrade and backup the CM manually. Don't use the commands and endpoints from this table.

REST endpoint CLI command Description
upgrade/cluster/manager splunk rolling-upgrade cluster-manager Initiate the upgrade process.
upgrade/cluster/status splunk rolling-upgrade cluster-manager Monitor the automated upgrade status. This endpoint displays the statuses of CM and cluster peers upgrades.

The status endpoint is not available while CM is down for the upgrade.

Upgrade the search head tier

The automated rolling upgrade of indexers does not upgrade an SHC. If needed, upgrade the SHC manually. To learn about upgrading the SHC, see Perform an automated rolling upgrade of a search head cluster.

Upgrade the indexer tier

Upgrade the indexer cluster

To initiate the upgrade of the indexer cluster, you can send a request to the REST endpoint or specify the corresponding CLI command on the cluster manager. The action starts an orchestrator process that performs the upgrade of indexer cluster peers. The orchestrator process downloads and installs a new Splunk package on all indexer peers while maintaining data searchability on all buckets. To achieve it, the orchestrator process makes sure that the number of indexer peers that are undergoing the upgrade at one point in time does not exceed min((search_factor - 1), (cluster_size - 1)/2)) , where cluster_size is the total number of peers in the cluster. For example, assuming search_factor = 3 , and the indexer tier includes 10 indexers, the automated rolling upgrade app upgrades 2 indexers in parallel.

Based on this formula, if search_factor == 1 or a number of peers in a cluster is <= 2, you can't perform the automated rolling upgrade.

By default, the app supports only .tgz packages. The app unpacks their content to the $SPLUNK_HOME directory on cluster peers, which is typically located in /opt/splunk. To learn how to customize the installation step in the same way as for a CM, by using custom hooks, for example, shell scripts, see Create a custom installation hook.

REST endpoint CLI command Description
upgrade/cluster/all_peers splunk rolling-upgrade cluster-all-peers Initiate the automated rolling upgrade process for cluster peers. The endpoint supports the "force" parameter that allows you to skip a health check before performing an upgrade. Example:

https://localhost:8089/services/upgrade/cluster/all_peers?force=true
To learn about troubleshooting and recovery, see Troubleshoot and recover from automated rolling upgrade failure.

upgrade/cluster/status splunk rolling-upgrade cluster-status Monitor the automated rolling upgrade status. Call this endpoint on the CM to display the upgrade status of the CM and cluster peers.
upgrade/cluster/recovery splunk rolling-upgrade cluster-recovery Return the cluster to a ready state after the automated rolling upgrade fails.

Upgrade non-clustered indexers

If the indexer tier consists of one or several non-clustered indexer instances, the splunk-rolling-upgrade app provides only partial automation functionality. As there is no CM, there is no central instance from which you can orchestrate the upgrade. To initiate the upgrade, you can send a request to the REST endpoint or specify the corresponding CLI command on every indexer instance separately. The upgrade/cluster/status endpoint returns only the upgrade status of the single instance on which it is called.

REST endpoint CLI command Description
upgrade/standalone splunk rolling-upgrade standalone Initiate the upgrade process for a single indexer in a non-clustered deployment..
upgrade/cluster/status splunk rolling-upgrade cluster-status Monitor the upgrade status of a single instance.

Perform an automated rolling upgrade

This section shows you how to configure and use the splunk-rolling-upgrade app to run an automated rolling upgrade.

Configure the rolling upgrade app for clustered deployments

Before you can run an automated rolling upgrade, create and configure the splunk-rolling-upgrade app for indexer upgrades and distribute it to indexer peers. To do so, take the following steps:

The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.

  1. On the CM, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
  2. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new rolling_upgrade.conf file containing the following:
    [downloader]
    package_path = <path to a package>
    md5_checksum = <md5 checksum of a package>
    

    Where:

    • package_path is a URI to the location of a new installation package. For the specification file, refer to $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade.
    • md5_checksum contains md5 checksum of that package in the hexadecimal format.
  3. (Optional) If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
    1. Create a custom installation hook.

      The installation hook is a script that contains installation instructions for the specific package type. To learn about creating the hook, see Create a custom installation hook.

    2. Run the chmod +x command to set execution permissions for the associated hook (script) that you wrote.
    3. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    5. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook, for example:
    6. [hook]
      install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      
  4. If you run Splunk Enterprise as a systemd service, perform an automated rolling upgrade by following these steps:

      Provide your own custom commands to stop, start, and offline a Splunk Enterprise instance run as a systemd service.

    1. Run the chmod +x command to set execution permissions for the associated hook, that is the script that you wrote.
    2. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory if it doesn't already exist.
    3. Copy the hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the [hook] stanza, set the control_script_path value to the location of the hook. For example:
      [hook]
      control_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      
    To learn how to create a custom hook, see Create a custom control hook.
  5. On the CM, to create a configuration for cluster peers, copy $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config into $SPLUNK_HOME/etc/manager-apps directory.
    If cluster peers need to use different paths, update $SPLUNK_HOME/etc/manager-apps/default/rolling_upgrade.conf.

    On peers, after a bundle push, the splunk-rolling-upgrade-config app appears in $SPLUNK_HOME/etc/peer-apps directory. Make sure that you can access package_path and install_script_path on peers by specifying, for example, this path:

    [hook]
    install_script_path = $SPLUNK_HOME/etc/peer-apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
    
  6. Validate and push a bundle by specifying the following CLI commands:
    splunk validate cluster-bundle
    splunk apply cluster-bundle
    

For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.

Configure the rolling upgrade app for non-clustered deployments

To configure each standalone indexer or LM, follow these steps:

The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.

  1. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
  2. In the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory, create a new rolling_upgrade.conf file containing the following:
    [downloader]
    package_path = <path to a package>
    md5_checksum = <md5 checksum of a package>
    

    Where:

    • package_path is a URI to the location of a new installation package. For the specification file, refer to $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade.
    • md5_checksum contains md5 checksum of that package in the hexadecimal format.
  3. (Optional) If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
    1. Create a custom installation hook.

      The installation hook is a script that contains installation instructions for the specific package type. To learn about creating the hook, see Create a custom installation hook.

    2. Run the chmod +x command to set execution permissions for the associated hook (script) that you wrote.
    3. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    5. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook, for example:
    6. [hook]
      install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      
  4. If you run Splunk Enterprise as a systemd service, perform an automated rolling upgrade by following these steps:

      Provide your own custom commands to stop, start, and offline a Splunk Enterprise instance run as a systemd service.

    1. Run the chmod +x command to set execution permissions for the associated hook, that is the script that you wrote.
    2. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory if it doesn't already exist.
    3. Copy the hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the [hook] stanza, set the control_script_path value to the location of the hook. For example:
      [hook]
      control_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      
    To learn how to create a custom hook, see Create a custom control hook.
  5. Repeat these steps on each standalone indexer or LM.

Run the automated rolling upgrade for clustered deployments

After you configure the splunk-rolling-upgrade app, follow these steps to run the automated rolling upgrade of your indexer cluster. You can use the REST API or specify the corresponding CLI commands.

  1. Identify the URI and management port of the CM.
  2. To initiate the CM upgrade process, on any CM, send an HTTP POST request to the upgrade/cluster/manager endpoint. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/manager?output_mode=json"
    
  3. Monitor the upgrade status by sending an HTTP GET request to the upgrade/cluster/status endpoint. For example:
    curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
    
  4. Wait until the status response shows that the CM is upgraded successfully.
  5. To initiate the upgrade of all the clustered indexers, on the CM, send an HTTP POST request to the upgrade/cluster/all_peers. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/all_peers?output_mode=json"
    
  6. Keep monitoring the upgrade using this request:
    curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
    

Run the automated rolling upgrade for non-clustered deployments

  1. Identify the URI and management port of any standalone indexer or LM.
  2. To initiate the upgrade process, send an HTTP POST request to the upgrade/standalone endpoint. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/standalone?output_mode=json"
    
  3. Monitor the upgrade status by sending an HTTP GET request to the upgrade/cluster/status endpoint. For example:
    curl -X GET -u admin:pass -k "https://localhost:8089/services/upgrade/cluster/status?output_mode=json"
    

Create a custom installation hook

To learn how to create an installation hook, see Create a custom installation hook.

Create a custom control hook

A control hook is a custom binary or script used to perform custom start, stop and offline commands on a Splunk Enterprise instance on each device where Splunk Enterprise is upgraded. The splunk-rolling-upgrade app uses the control hook to stop the Splunk Enterprise instance before and start it after upgrading the package.

The splunk-rolling-upgrade app passes the following data in this order:

  1. Path to the splunk binary file, for example $SPLUNK_HOME/bin/splunk

    The splunk-rolling-upgrade app uses this path to call the commands.

  2. One of the commands: stop, start, or offline
  3. token if the app passes the offline command.

Make sure the control hook includes the following:

  • Instructions how to stop, start, and offline a Splunk Enterprise instance
  • Executable permissions that you can set using the chmod+x command.

    Example of a default control hook
    #!/bin/bash
    set -e
    SPLUNK_PATH="$1"
    COMMAND="$2"
    
    
    if [ "$COMMAND" = "start" ]; then
       "$SPLUNK_PATH" start --accept-license --answer-yes
    elif [ "$COMMAND" = "offline" ]; then
       TOKEN="$3"
       "$SPLUNK_PATH" offline -token "$TOKEN"
    elif [ "$COMMAND" = "stop" ]; then
       "$SPLUNK_PATH" stop
    else
       echo "Invalid command"
       exit 1
    fi
    


Use custom control hooks to ugrade systemd-managed Splunk Enterprise

On a Splunk Enterprise instance that is managed by systemd, you can perform the automated rolling upgrade using one of the following ways:

To continue, acquire elevated privileges for the control hook script. They are required to modify files of the systemd service and to stop and start Splunkd.service that runs under systemd. Typically, the Splunk Enterprise instance runs under the splunk user without these privileges.

  • By taking the following steps:
    1. In /etc/systemd/system/Splunkd.service unit file, change the value of the KillMode setting to process.

      To stop a Splunk Enterprise instance, by default, the Splunkd.service process uses the KillMode=mixed setting to kill all children processes. However, it also kills one of the scripts run by the splunk-rolling-upgrade app that is used to stop and start a Splunk Enterprise instance and to perform the upgrade. A temporary modification of the KillMode value prevents killing that script.

    2. Reload the systemd deamon.
    3. Perform an automated rolling upgrade. See Perform an automated rolling upgrade.
    4. In /etc/systemd/system/Splunkd.service unit file, set KillMode to the mixed value.
    5. Reload systemd deamon.
  • Automatically, by using a control hook script.
    Example of a control hook that updates the KillMode:
    #!/bin/bash
    set -e
    SPLUNK_PATH="$1"
    COMMAND="$2"
    SPLUNK_SYSTEMD_DIR="/etc/systemd/system/Splunkd.service.d"
    
    
    cleanup_if_exists() {
       if [ -d "$SPLUNK_SYSTEMD_DIR" ]; then
           sudo rm -rf "$SPLUNK_SYSTEMD_DIR" && sudo systemctl daemon-reload
       fi
    }
    
    
    handle_error() {
       cleanup_if_exists
       echo "An error occurred. splunk_control.sh exiting with status: $1."
       exit "$1"
    }
    
    
    override_kill_mode() {
       sudo mkdir "$SPLUNK_SYSTEMD_DIR" || handle_error "$?"
       (sudo tee "$SPLUNK_SYSTEMD_DIR/override.conf" <<EOF
    [Service]
    KillMode=process
    EOF
       ) || handle_error "$?"
       sudo systemctl daemon-reload || handle_error "$?"
    }
    
    
    if [ "$COMMAND" = "start" ]; then
       cleanup_if_exists
       sudo "$SPLUNK_PATH" start --accept-license --answer-yes
    elif [ "$COMMAND" = "offline" ]; then
       override_kill_mode
       TOKEN="$3"
       "$SPLUNK_PATH" offline -token "$TOKEN"
       cleanup_if_exists
    elif [ "$COMMAND" = "stop" ]; then
       override_kill_mode
       sudo "$SPLUNK_PATH" stop
       cleanup_if_exists
    else
       echo "Invalid command"
    fi
    

Troubleshoot and recover from automated rolling upgrade failure

To track the automated rolling upgrade status, check the response from the upgrade/cluster/status endpoint. Typically, the response in a json file from the CM includes the following:

{
   "message":{
      "current_instance_upgrade":{
         "upgrader_pid":6025,
         "from_version":"9.3.0",
         "to_version":"9.4.0",
         "last_modified":"Wed Jun 07 12:20:24 2023",
         "status":"completed"
      },
      "peers_upgrade":{
         "orchestrator_pid":6865,
         "from_version":"9.3.0",
         "to_version":"9.4.0",
         "overall_status":"in_progress",
         "peers":[
            {
               "name":"idx3",
               "last_modified":"Wed Jun 07 12:24:16 2023",
               "status":"completed",
               "upgrader_pid": 1,
            },
            ...
         ],
         "statistics":{
            "peers_to_upgrade":4,
            "overall_peers_upgraded":2,
            "overall_peers_upgraded_percentage":50
         }
      }
   }
}

The response consists of two sections:

  • "current_instance_upgrade" refers to the status of the CM upgrade. If the CM upgrade fails, you can see a "failed" value in the "status" field.
  • "peers_upgrade" tracks the upgrade status of indexer cluster peers. If the indexer cluster upgrade fails, typically, you can see both these values:
    • "failed" value in the "overall_status" field for the "peers_upgrade" section
    • "failed"** value in the "status" field for one of the peers from the "peers" list

To learn why the upgrade failed, check the logs on the CM and on the indexer(s) where the upgrade has failed. To find logs related to an indexer cluster upgrade in the splunk-rolling-upgrade app, check the following 2 log files under splunk/var/log/splunk:

  • splunk_idxc_upgrade_upgrader_script.log
  • splunk_idxc_upgrade_rest_endpoints.log

Resolve issues resulting from the logs

After you detect an upgrade failure, log in to the environment of the instance where the upgrade has failed and check the logs to identify and resolve the issue.

If the cluster member where the issue occurred is down, manually install the package on that machine, and start Splunk Enterprise on that member.

  1. Send an HTTP POST request to the upgrade/cluster/recovery endpoint on the CM. For example:
    curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/recovery"
    
    • If the previous upgrade process is still running, for example, it has become unresponsive, the endpoint indicates the identifier of the process (PID). Before you retry the recovery, stop the process.
    • If the previous upgrade process is complete, the upgrade/cluster/recovery operation tries to return the cluster to the ready state, where you can run the automated rolling upgrade again after it failed. If the previous upgrade crashed and the status is stuck in the "in_progress" state, the operation sets the current upgrade status to "failed". If the status is "failed", proceed to the next step. Alternatively, you can initiate the recovery process by running the Splunk rolling-upgrade cluster-recovery CLI command.
  2. Resume the upgrade by sending an HTTP POST request to the upgrade/cluster/all_peers endpoint on the CM. For example:
    curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/all_peers?output_mode=json"
    

    In rare cases, where an indexer peer fails during the upgrade, the cluster may no longer meet the replication factor. It causes the upgrade/cluster/all_peers endpoint to stop the upgrade due to a failed health check. You can skip the health check by specifying the REST argument force=true:

    curl -X POST -u admin:pass -k "https://<cm_address>:8089/services/upgrade/cluster/all_peers?force=true&output_mode=json" 
    

    In this case, some data on the cluster may not be searchable during the upgrade.

Last modified on 06 December, 2024
Perform a rolling upgrade of an indexer cluster   Ways to get data into an indexer cluster

This documentation applies to the following versions of Splunk® Enterprise: 9.4.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters