Splunk® Enterprise

Distributed Search

Perform an automated rolling upgrade of a search head cluster

Splunk Enterprise version 9.1.0 and higher supports automated rolling upgrade of search head clusters using the custom splunk-rolling-upgrade app, which comes with the Splunk Enterprise product. A rolling upgrade performs a phased upgrade of cluster members to a new version of Splunk Enterprise with minimal interruption of ongoing searches. The splunk-rolling-upgrade app automates the manual rolling upgrade steps described in Perform a rolling upgrade of a search head cluster.

Requirements and considerations

Review the following requirements and considerations before you configure and initiate an automated rolling upgrade.

  • The splunk-rolling-upgrade app requires Linux OS. Mac OS and Windows are not supported.
  • Automated rolling upgrade only applies to upgrade from version 9.1.x and higher to subsequent versions of Splunk Enterprise. To determine your upgrade path and confirm the compatibility of the upgraded search head cluster version with existing Splunk Enterprise components and applications, see the Splunk products version compatibility matrix.
  • Automated rolling upgrade supports the following installation package formats:
  • To use the splunk-rolling-upgrade app, you must hold a splunk_system_upgrader role.

    The admin role contains all of the capabilities required by default. However, to limit access, it is a best practice to create a dedicated user with the splunk_system_upgrader role which is required to run the rolling upgrade.

  • You must upgrade the cluster manager before upgrading the search head tier.
  • To use the splunk-rolling-upgrade app with Splunk Enterprise instances that are managed by systemd, you need to be able to run a custom control script with elevated privileges. See Create a custom control hook.

How an automated rolling upgrade works

Use the splunk-rolling-upgrade app to perform an automated rolling upgrade of a search head cluster. You initiate the rolling upgrade with a single request to a REST endpoint or by specifying the corresponding CLI command. The app then downloads or gets a new Splunk Enterprise install package and installs it on each cluster member one by one. To learn about the package_path setting, see Configure the rolling upgrade app for clustered deployments. By default, the app handles only .tgz packages by unpacking the contents in the $SPLUNK_HOME directory, which is typically /opt/splunk.

For more flexibility with installation, the splunk-rolling-upgrade app does the following:

  • Implements the package installation process as a custom hook (shell script). You can write and plug in the installation logic, which is required for deb and rpm package types.
  • Provides additional separate endpoints for monitoring the upgrade process and remediating failures.

The splunk-rolling-upgrade app provides the following REST endpoints and corresponding CLI commands to perform an automated search head cluster rolling upgrade.

  • For cluster upgrade, you can run these operations on any cluster member.
  • For deployer upgrade, you must run these operation on the deployer.
  • For non-clustered upgrade, which means upgrading search heads that are not a part of a search head cluster, you must run these operations on each single search head.
REST endpoint CLI command Description
upgrade/shc/upgrade splunk rolling-upgrade shc-upgrade Initiate the automated rolling upgrade process.
upgrade/shc/status splunk rolling-upgrade shc-status Monitor automated rolling upgrade status.
upgrade/shc/recovery splunk rolling-upgrade shc-recovery Return the cluster to a ready state after automated rolling upgrade failure.

Perform an automated rolling upgrade

This section shows you how to configure and use the splunk-rolling-upgrade app to run an automated rolling upgrade.

Configure the rolling upgrade app for clustered deployments

Before you can run an automated rolling upgrade, create and configure the splunk-rolling-upgrade app for search head upgrades and distribute it to search head peers.

The default splunk-rolling-upgrade installation script supports .tgz packages only. If you plan to use rpm or deb packages, check the information in the following steps.

To configure the splunk-rolling-upgrade app, take the following steps:

  1. On the deployer, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
  2. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new rolling_upgrade.conf file containing the following:
    [downloader] 
    package_path = <path to a package>
    md5_checksum = <md5 checksum of a package>
    

    Where:

    • package_path is an absolute path to a location of the new installation package. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade. The package_path setting supports the following URI paths:
      • Paths to local files, for example, file:///path/to/package.tgz
        Note that this example contains 3 slash characters where 2 slashes represent a protocol and 1 an absolute path.
      • Remote links that require no authentication, for example, http://<...>
    • md5_checksum contains md5 checksum of that package in the hexadecimal format.
  3. On the deployer, in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new file called inputs.conf, containing the following scripted input, where <splunk_user> is the name of the user the app uses to send requests to REST endpoints.

    To perform this step, you must hold a splunk_system_upgrader role.

    [script://$SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/bin/complete.py] 
    passAuth=<splunk_user>
    

    Splunk Enterprise passes the authentication token for the specified user to the splunk-rolling-upgrade app and does not store the token.

  4. (Optional) If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
    1. Run the chmod +x command to set execution permissions for the associated hook (script) that you wrote.
    2. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    3. Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook. For example:
      [hook]
      install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      

      Note: The install_script_path setting supports only local paths and environment variable expansions.

  5. If you run Splunk Enterprise as a systemd service, perform an automated rolling upgrade by following these steps:

      Provide your own custom commands to stop, start, and offline a Splunk Enterprise instance run as a systemd service.

    1. Run the chmod +x command to set execution permissions for the associated hook, that is the script that you wrote.
    2. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory if it doesn't already exist.
    3. Copy the hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the [hook] stanza, set the control_script_path value to the location of the hook. For example:
      [hook]
      control_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      
    To learn how to create a custom hook, see Create a custom control hook.
  6. On the deployer, copy $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config to the configuration bundle under SPLUNK_HOME/etc/shcluster/apps.
  7. On the deployer, distribute the configuration bundle to all search head cluster members using the following command:
    splunk apply shcluster-bundle -target <uri-to-shc-peer>:<management port> -auth admin:<password>
    
    For more information on how to apply the configuration bundle, see Use the deployer to distribute apps and configuration updates.

For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.

Configure the rolling upgrade app for non-clustered deployments

To create and configure the app on each search head, take these steps:

  1. On the search head, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
  2. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new rolling_upgrade.conf file containing the following:
    [downloader] 
    package_path = <path to a package>
    md5_checksum = <md5 checksum of a package>
    

    Where:

    • package_path is an absolute path to a location of the new installation package. Make sure this path is accessible from any Splunk Enterprise instance that you upgrade. The package_path setting supports URI paths to local files, for example, file://path/to/package.tgz, and remote links that require no authentication.
    • md5_checksum contains md5 checksum of that package in the hexadecimal format.
  3. On the search head, in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new file called inputs.conf, containing the following scripted input, where <splunk_user> is the name of the user the app uses to send requests to REST endpoints.
    [script://$SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/bin/complete.py] 
    passAuth=<splunk_user>
    

    Splunk Enterprise passes the authentication token for the specified user to the splunk-rolling-upgrade app and does not store the token.

  4. (Optional) If instead of a default .tgz package, you plan to use rpm or deb packages, follow these steps:
    1. Run the chmod +x command to set execution permissions for the associated hook (script) that you wrote.
    2. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    3. Copy your hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook. For example:
      [hook]
      install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      

      The install_script_path setting supports only local paths and environment variable expansions.

  5. If you run Splunk Enterprise as a systemd service, perform an automated rolling upgrade by following these steps:

      Provide your own custom commands to stop, start, and offline a Splunk Enterprise instance run as a systemd service.

    1. Run the chmod +x command to set execution permissions for the associated hook, that is the script that you wrote.
    2. Create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory if it doesn't already exist.
    3. Copy the hook to the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks directory.
    4. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the [hook] stanza, set the control_script_path value to the location of the hook. For example:
      [hook]
      control_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
      
    To learn how to create a custom hook, see Create a custom control hook.
  6. Repeat these steps on each search head.

For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.

Upgrade the license manager

The license manager (LM) role can be colocated on an instance that is performing other tasks. To learn about instances where you can colocate the LM, see Choose the instance to serve as the license manager. To upgrade the LM, identify the instance that serves as the LM. Depending on the instance, follow one of the upgrade workflows:

  • If the LM is colocated on other instances than search heads and cluster managers, follow these steps:
  1. Configure the app by taking the steps for non-clustered deployments. To view the steps, see Configure the rolling upgrade app for non-clustered deployments.
  2. Run the upgrade using the steps for non-clustered deployments. To view the steps, see Run the automated rolling upgrade for non-clustered deployments.
  • In other cases, upgrading the LM is not required. It is upgraded automatically when upgrading a search head cluster (SHC) or cluster manager (CM).

Run the automated rolling upgrade for clustered deployments

After you configure the splunk-rolling-upgrade app, follow these steps to run the automated rolling upgrade of your search head cluster, using the REST API or corresponding CLI commands:

  1. Identify the URI and management port of any search head cluster member.
  2. To initiate the rolling upgrade process, on any cluster member, send an HTTP POST request to the upgrade/shc/upgrade endpoint . For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
    

    First, the request triggers basic health checks to ensure the search head cluster is in a healthy state to perform the rolling upgrade. If all health checks pass, the endpoint initiates the rolling upgrade. For more information, see steps 1 and 2 in Perform a rolling upgrade.

    A successful request returns an "Upgrade initiated" message. For example:

    {
        "updated":"2022-11-24T17:25:54+0000",
        "author":"Splunk",
        "layout":"props",
        "entry":[
          {
                "title": "all_peers",
                "id": "/services/upgrade/shc/all_peers",
                "updated": "2024-06-13T12:21:12+0000",
                "links": {
                    "alternate": {
                        "href": "/cluster/all_peers"
                    }
                },
                "content": {
                    "message": "Cluster upgrade initiated",
                    "status": "succeeded"
                }
            }
        ]
    }
    
    

    In some cases the request can fail and return an error, for example, if health checks fail or if a rolling upgrade is already running. To troubleshoot the cause of a failure, review the HTTP return codes and check log files for details. The upgrade/shc/upgrade endpoint returns the following HTTP status codes:

    Code Description
    200 Upgrade operation successfully initiated.
    400 Configuration error.
    403
    • An upgrade is already running.
    • Upgrade is not required.
    • The search head cluster is not ready. Wait for the cluster to fully initialize.
    500 Internal Server Error. Something went wrong with the upgrade. Check log files for more information. Possible reasons:
    • The upgrade could not be triggered on a given member.
    501 Attempted to upgrade an unsupported deployment. (Rolling upgrade supports search head clusters, search heads and deployers only.)
    503 KV store is not ready.

    For more troubleshooting information, including relevant log files, see Troubleshoot and recover from automated rolling upgrade failure.

    For endpoint details, see upgrade/shc/upgrade in the REST API Reference Manual.

    Alternatively, on any cluster member, run the splunk rolling-upgrade shc-upgrade command to initiate the automated rolling upgrade.

  3. Monitor the status of the rolling upgrade until all cluster members are sucessfully upgraded. To monitor the rolling upgrade status, send an HTTP GET request to the upgrade/shc/status endpoint. For example:
    curl -u admin:pass -k "https://localhost:8089/services/upgrade/shc/status?output_mode=json"
    

    The response shows the current status of the rolling upgrade, including the upgrade status of the entire cluster, the status of each individual cluster member, and the total number and percentage of members upgraded. For example:

    {
        "updated":"2022-11-24T17:33:28+0000",
        "author":"Splunk",
        "layout":"props",
        "entry":[
            {
                "title":"status",
                "id":"/services/upgrade/shc/status",
                "updated":"2022-11-24T17:33:28+0000",
                "links":{
                    "alternate":{
                        "href":"shc/status"
                    }
                },
                "content":{
                    "message":{
                        "upgrade_status":"completed",
                        "statistics":{
                            "peers_to_upgrade":3,
                            "overall_peers_upgraded":3,
                            "overall_peers_upgraded_percentage":100
                        },
                        "peers":[
                            {
                                "name":"sh2",
                                "status":"upgraded",
                                "last_modified":"Thu Nov 24 17:29:41 2022"
                            },
                            {
                                "name":"sh1",
                                "status":"upgraded",
                                "last_modified":"Thu Nov 24 17:28:07 2022"
                            },
                            {
                                "name":"sh3",
                                "status":"upgraded",
                                "last_modified":"Thu Nov 24 17:31:15 2022"
                            }
                        ]
                    }
                }
            }
        ]
    }
    
    

    The upgrade/shc/status endpoint returns the following HTTP status codes:

    Code Description
    200 Unable to get the latest SHC status.
    400 Configuration error.
    500 Internal error. Check log files for more information on the error.
    501 Attempted to get the status of an unsupported deployment.
    503 Unable to access KV store. KV store probably still initializing.

    For endpoint details, see upgrade/shc/status in the REST API Reference Manual.

    Alternatively, run the splunk rolling-upgrade shc-status command to monitor the automated rolling upgrade.

    If you get a "Couldn't connect to server" response, such as the following, when monitoring the rolling upgrade status:

    % curl -u admin:pass -k https://10.225.218.144:8089/services/shc/status    
    
    curl: (7) Failed to connect to 10.225.218.144 port 8089 after 1212 ms: Couldn't connect to server
    

    it means that this cluster member is being restarted as a part of the upgrade process.

    You can get this response when trying to monitor the status of a machine that is temporarily down because the rolling upgrade process stops, unpacks the package, and restarts splunkd. In this case, monitor the status from a different cluster member, or wait until that cluster member is up and running again.

  4. Upgrade the deployer. When the upgrade/shc/status endpoint response shows "upgrade_status":"completed" for the entire cluster, repeat step 2 to upgrade the deployer.

Run the rolling upgrade app for non-clustered deployments

After you configure the splunk-rolling-upgrade app, follow these steps to run the upgrade of your search head tier, using the REST API or corresponding CLI commands:

  1. Identify the URI and management port of any standalone search head.
  2. To initiate the rolling upgrade process, send an HTTP POST request to the upgrade/shc/upgrade endpoint. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
    

    A successful request returns an "Upgrade initiated" message. For example:

    {
        "updated":"2022-11-24T17:25:54+0000",
        "author":"Splunk",
        "layout":"props",
        "entry":[
          {
                "title": "shc",
                "id": "/services/upgrade/shc/upgrade",
                "updated": "2024-06-13T12:21:12+0000",
                "links": {
                    "alternate": {
                        "href": "/shc/upgrade"
                    }
                },
                "content": {
                    "message": "Upgrade initiated",
                    "status": "succeeded"
                }
            }
        ]
    }
    
    

    For more troubleshooting information, including relevant log files, see Troubleshoot and recover from automated rolling upgrade failure.

    For endpoint details, see upgrade/shc/upgrade in the REST API Reference Manual.

  3. Monitor the rolling upgrade status by sending an HTTP GET request to the upgrade/shc/status endpoint. For example:
    curl -u admin:pass -k "https://localhost:8089/services/upgrade/shc/status?output_mode=json"
    

    The response shows the current status of the rolling upgrade. For example:

    {
      "updated": "2024-06-24T14:48:53+0000",
      "author": "Splunk",
      "layout": "props",
      "entry": [
        {
          "title": "status",
          "id": "/services/upgrade/shc/status",
          "updated": "2024-06-24T14:48:53+0000",
          "links": {
            "alternate": {
              "href": "/shc/status"
            }
          },
          "content": {
            "message": {
              "upgrade_status": "completed",
              "statistics": {
                "peers_to_upgrade": 1,
                "overall_peers_upgraded": 1,
                "overall_peers_upgraded_percentage": 100
              },
              "peers": [
                {
                  "name": "sh1",
                  "status": "upgraded",
                  "last_modified": "Mon Jun 24 14:34:41 2024"
                }
              ]
            }
          }
        }
      ]
    }
    
    

    The upgrade/shc/status endpoint returns the following HTTP status codes:

    Code Description
    200 Unable to get the latest SHC status.
    400 Configuration error.
    500 Internal error. Check log files for more information on the error.
    501 Attempted to get the status of an unsupported deployment.
    503 Unable to access KV store. KV store probably still initializing.


    Alternatively, run the splunk rolling-upgrade shc-status command to monitor the automated rolling upgrade.

  4. Repeat these steps for each standalone search head.

Create a custom installation hook

An installation hook is a custom binary or script that installs the Splunk package on every machine. The splunk-rolling-upgrade app downloads the package specified in package_path in rolling_upgrade.conf, then sends a request to the hook to install the package on the cluster member.

The app passes the package path to the hook as the first parameter, and $SPLUNK_HOME as the second parameter. The hook must contain installation instructions for the package, and must have executable permissions, which you can set using the chmod+x command. For example, the following shows the default installation hook for .tgz packages:

#!/bin/bash
set -e
splunk_tar="$1"
dest_dir="$2"
tar -zvxf "$splunk_tar" --strip-components 1 -C "$dest_dir"

Custom hooks for deb and rpm package installation

Installation of deb and rpm packages requires sudo permissions, while the Splunk instance typically runs under the 'splunk' user without those privileges. To perform an automated rolling upgrade using deb or rpm packages, create a custom installation hook. Before you run installation commands, such as sudo rpm --upgrade for rpm packages, take these steps:

      
  1. Acquire elevated privileges for the installation hook for deb and rpm packages.
  2.   
  3. Install the correct package manager on your machine:
       
    • dpkg for deb packages
    •  
    • rpm for rpm packages.

Create a custom control hook

A control hook is a custom binary or script used to perform custom start, stop and offline commands on a Splunk Enterprise instance on each device where Splunk Enterprise is upgraded. The splunk-rolling-upgrade app uses the control hook to stop the Splunk Enterprise instance before and start it after upgrading the package.

The splunk-rolling-upgrade app passes the following data in this order:

  1. Path to the splunk binary file, for example $SPLUNK_HOME/bin/splunk

    The splunk-rolling-upgrade app uses this path to call the commands.

  2. One of the commands: stop, start, or offline
  3. token if the app passes the offline command.

Make sure the control hook includes the following:

  • Instructions how to stop, start, and offline a Splunk Enterprise instance
  • Executable permissions that you can set using the chmod+x command.

    Example of a default control hook
    #!/bin/bash
    set -e
    SPLUNK_PATH="$1"
    COMMAND="$2"
    
    
    if [ "$COMMAND" = "start" ]; then
       "$SPLUNK_PATH" start --accept-license --answer-yes
    elif [ "$COMMAND" = "offline" ]; then
       TOKEN="$3"
       "$SPLUNK_PATH" offline -token "$TOKEN"
    elif [ "$COMMAND" = "stop" ]; then
       "$SPLUNK_PATH" stop
    else
       echo "Invalid command"
       exit 1
    fi
    



Use custom control hooks to upgrade systemd-managed Splunk Enterprise

On a Splunk Enterprise instance that is managed by systemd, you can perform the automated rolling upgrade using one of the following ways:

To continue, acquire elevated privileges for the control hook script. They are required to modify files of the systemd service and to stop and start Splunkd.service that runs under systemd. Typically, the Splunk Enterprise instance runs under the splunk user without these privileges.

  • By taking the following steps:
    1. In /etc/systemd/system/Splunkd.service unit file, change the value of the KillMode setting to process.

      To stop a Splunk Enterprise instance, by default, the Splunkd.service process uses the KillMode=mixed setting to kill all children processes. However, it also kills one of the scripts run by the splunk-rolling-upgrade app that is used to stop and start a Splunk Enterprise instance and to perform the upgrade. A temporary modification of the KillMode value prevents killing that script.

    2. Reload the systemd deamon.
    3. Perform an automated rolling upgrade. See Perform an automated rolling upgrade.
    4. In /etc/systemd/system/Splunkd.service unit file, set KillMode to the mixed value.
    5. Reload systemd deamon.
  • Automatically, by using a control hook script.
    Example of a control hook that updates the KillMode:
    #!/bin/bash
    set -e
    SPLUNK_PATH="$1"
    COMMAND="$2"
    SPLUNK_SYSTEMD_DIR="/etc/systemd/system/Splunkd.service.d"
    
    
    cleanup_if_exists() {
       if [ -d "$SPLUNK_SYSTEMD_DIR" ]; then
           sudo rm -rf "$SPLUNK_SYSTEMD_DIR" && sudo systemctl daemon-reload
       fi
    }
    
    
    handle_error() {
       cleanup_if_exists
       echo "An error occurred. splunk_control.sh exiting with status: $1."
       exit "$1"
    }
    
    
    override_kill_mode() {
       sudo mkdir "$SPLUNK_SYSTEMD_DIR" || handle_error "$?"
       (sudo tee "$SPLUNK_SYSTEMD_DIR/override.conf" <<EOF
    [Service]
    KillMode=process
    EOF
       ) || handle_error "$?"
       sudo systemctl daemon-reload || handle_error "$?"
    }
    
    
    if [ "$COMMAND" = "start" ]; then
       cleanup_if_exists
       sudo "$SPLUNK_PATH" start --accept-license --answer-yes
    elif [ "$COMMAND" = "offline" ]; then
       override_kill_mode
       TOKEN="$3"
       "$SPLUNK_PATH" offline -token "$TOKEN"
       cleanup_if_exists
    elif [ "$COMMAND" = "stop" ]; then
       override_kill_mode
       sudo "$SPLUNK_PATH" stop
       cleanup_if_exists
    else
       echo "Invalid command"
    fi
    
    

Troubleshoot and recover from automated rolling upgrade failure

Using the splunk-rolling-upgrade app, you can return a search head cluster to a ready state, where you can run the automated rolling upgrade again, after a rolling upgrade failed. Before you initiate the recovery process, make sure that the rolling upgrade has failed or crashed.

When a rolling upgrade fails, you can see the following status of the "upgrade_status" field in the upgrade/shc/status endpoint response:

  • "failed", in most cases
  • "in_progress", in some cases, for example, if the upgrade crashes while the Splunk instance is stopped.

To investigate the cause of the rolling upgrade failure, take these steps:

  1. Find the last instance that was upgraded at the time of failure. To do it, check the upgrade/shc/status endpoint response for the member whose "status" field is set to different values than "READY" or "UPGRADED".
  2. Check the logs for errors.

The splunk-rolling-upgrade app writes to 3 log files under splunk/var/log/splunk:

  • splunk_shc_upgrade_upgrader_script.log
  • splunk_shc_upgrade_rest_endpoints.log
  • splunk_shc_upgrade_completion_script.log

If the request response shows "no_upgrade", look for errors in the splunk_shc_upgrade_rest_endpoints.log file on the member where you ran the request. Address the issues that you find in the logs. Make sure the issues do not repeat on other cluster members during future rolling upgrade attempts.

After you address the issues that caused the failure, prepare the cluster for another rolling upgrade attempt, as follows:

  1. If the cluster member where the issue occurred is down, manually perform the installation of the package on that machine. Remove splunk/var/run/splunk/trigger-rolling-upgrade (if it exists), and start Splunk on that member.
  2. Send an HTTP POST request to the upgrade/shc/recovery endpoint. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/recovery"
    

    This operation returns the cluster to the ready state, where you can run the automated rolling upgrade again after failure. It also sets the current upgrade status to "failed". Note that it can take some time for the KV store to initialize after startup.

    The upgrade/shc/recovery endpoint returns the following HTTP status codes:

    Code Description
    200 Recovery was executed successfully.
    400 Configuration error.
    500 Internal error. Check log files for more information on the error.
    501 Attempted to run a recovery on an unsupported deployment.

    For endpoint details, see upgrade/shc/recovery in the REST API Reference Manual.

    Alternatively, run the splunk rolling-upgrade shc-recovery command to initiate the recovery process.

  3. If the upgrade/shc/recovery endpoint response contains a message such as the following:
    {
      "messages": [
        {
          "type": "succeeded",
          "text": "SHC partially recovered. Please turn off manual detention mode on the following peers: ['sh1']"
        }
      ]
    }
    

    then send an HTTP POST request to the /shcluster/member/control/control/set_manual_detention endpoint, turning off manual detention on the search head specified in the response. For example:

    curl -u admin:pass -k "https://localhost:8089/servicesNS/admin/search/shcluster/member/control/control/set_manual_detention -d manual_detention=off"
    

    For endpoint details, see shcluster/member/control/control/set_manual_detention in the REST API Reference Manual.

  4. Resume the upgrade by sending an HTTP POST request to the upgrade/shc/upgrade endpoint. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
    
  5. For details on how to run the automated rolling upgrade, see Run the automated rolling upgrade.

Last modified on 06 December, 2024
Perform a rolling upgrade of a search head cluster   Configure the search head cluster

This documentation applies to the following versions of Splunk® Enterprise: 9.4.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters