Splunk® Enterprise

Distributed Search

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Perform an automated rolling upgrade of a search head cluster

Splunk Enterprise version 9.1.0 and higher supports automated rolling upgrade of search head clusters using the custom splunk-rolling-upgrade app, which comes with the Splunk Enterprise product. A rolling upgrade performs a phased upgrade of cluster members to a new version of Splunk Enterprise with minimal interruption of ongoing searches. The splunk-rolling-upgrade app automates the manual rolling upgrade steps described in Perform a rolling upgrade of a search head cluster.

Requirements and considerations

Review the following requirements and considerations before you configure and initiate an automated rolling upgrade.

  • The splunk-rolling-upgrade app requires Linux OS. Mac OS and Windows are not supported.
  • Automated rolling upgrade only applies to upgrade from version 9.1.x and higher to subsequent versions of Splunk Enterprise. To determine your upgrade path and confirm the compatibility of the upgraded search head cluster version with existing Splunk Enterprise components and applications, see the Splunk products version compatibility matrix.
  • Automated search head cluster rolling upgrade supports the following installation package formats:
  • To use the splunk-rolling-upgrade app, you must hold a role that contains these capabilities:
    • upgrade_splunk_shc
    • list_search_head_clustering
    • list_settings
    • use_remote_proxy
  • The admin role contains all of the capabilities required by default. However, to limit access, it is a best practice to create a dedicated role/user with only the capabilities required to run the rolling upgrade.

How an automated rolling upgrade works

The splunk-rolling-upgrade app provides the functionality required to perform an automated rolling upgrade of a search head cluster. You initiate the rolling upgrade with a single request to a REST endpoint or by specifying the corresponding CLI command. The app then downloads a new Splunk Enterprise install package and installs it on each cluster member one by one. By default the app handles only .tgz packages by unpacking the contents in the $SPLUNK_HOME directory, which is typically /opt/splunk.

For more flexibility with installation, the splunk-rolling-upgrade app implements the package installation process as a custom hook (shell script). This means you can write and plug in the installation logic, which is required for deb and rpm package types. The app provides additional separate endpoints for monitoring the upgrade process and remediating failures.

The splunk-rolling-upgrade app provides the following REST endpoints and corresponding CLI commands to perform an automated search head cluster rolling upgrade. For cluster upgrade, you can run these operations on any cluster member. For deployer upgrade, you must run these operation on the deployer.

REST endpoint CLI command Description
upgrade/shc/upgrade splunk rolling-upgrade shc-upgrade Initiate the automated rolling upgrade process.
upgrade/shc/status splunk rolling-upgrade shc-status Monitor automated rolling upgrade status.
upgrade/shc/recovery splunk rolling-upgrade shc-recovery Return the cluster to a ready state after automated rolling upgrade failure.

Perform an automated rolling upgrade

This section shows you how to configure and use the splunk-rolling-upgrade app to run an automated rolling upgrade of a search head cluster.

Configure the rolling upgrade app

Before you can run an automated rolling upgrade, you must configure the splunk-rolling-upgrade app for your deployment. To do so, you can create an add-on by creating a folder in $SPLUNK_HOME/etc/shcluster/apps on the deployer, called for example "splunk-rolling-upgrade-config", which contains the required configurations, and distribute the add-on to search head cluster members using the deployer.

The default splunk-rolling-upgrade installation script supports .tgz packages only. To perform an upgrade using .rpm or .deb package formats, you must create a custom script that contains installation instructions for the specific package type, and specify the path to the package in the install_script_path field under the [hook] stanza in rolling_upgrade.conf. For more information, see Create a custom installation hook.

To configure the splunk-rolling-upgrade app:

  1. On the deployer, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default directory.
  2. In $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new rolling_upgrade.conf file containing the following contents, where package_path points to the installation package for the new version to which you are upgrading:
    [downloader] 
    package_path = <path to a package>
    
    The package_path setting supports URI paths to local files, for example file://path/to/package.tgz, and remote links that require no authentication.
  3. On the deployer, in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default, create a new file called inputs.conf, containing the following scripted input, where <splunk_user> is the name of the user the app uses to send requests to REST endpoints.
    [script://$SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/bin/complete.py] 
    passAuth=<splunk_user>
    
    Splunk Enterprise passes the authentication token for the specified user to the splunk-rolling-upgrade app and does not store the token. The specified user must hold a role that contains all of the capabilities required to run the splunk-rolling-upgrade app. For more information, see Requirements and considerations.
  4. (Optional) If you plan to use rpm or deb packages, run the chmod +x command to set execution permissions for the associated hook (script) that you wrote. Next, create the $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/default directory, and copy your hook there. Then, in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/default/rolling-upgrade.conf, under the hook stanza, set the install_script_path value to the location of the hook. For example:
    [hook]
    install_script_path = $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config/hooks/<hook_file_name>
    

    The install_script_path setting supports only local paths and environment variable expansions.

  5. On the deployer, copy $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade-config to the configuration bundle under SPLUNK_HOME/etc/shcluster/apps.
  6. On the deployer, distribute the configuration bundle to all search head cluster members using the following command:
    splunk apply shcluster-bundle -target <uri-to-shc-peer>:<management port> -auth admin:<password>
    
    For more information on how to apply the configuration bundle, see Use the deployer to distribute apps and configuration updates.

For detailed information on rolling_upgrade.conf settings, see the rolling_upgrade.conf.spec file located in $SPLUNK_HOME/etc/apps/splunk-rolling-upgrade/README/.

Run the automated rolling upgrade

After you configure the splunk-rolling-upgrade app, follow these steps to run the automated rolling upgrade of your search head cluster, using the REST API or corresponding CLI commands:

CLI commands for automated rolling upgrade do not return error messages.

  1. Identify the URI and management port of any search head cluster member.
  2. On any cluster member, send an HTTP POST request to the upgrade/shc/upgrade endpoint to initiate the rolling upgrade process. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
    

    The request first triggers basic health checks to ensure the search head cluster is in a healthy state to perform the rolling upgrade. If all health checks pass, the endpoint initiates the rolling upgrade. For more information, see steps 1 and 2 in Perform a rolling upgrade.

    A successful request returns an "Upgrade initiated" message. For example:

    {
        "updated":"2022-11-24T17:25:54+0000",
        "author":"Splunk",
        "layout":"props",
        "entry":[
            {
                "title":"upgrade",
                "id":"/services/upgrade/shc/upgrade",
                "updated":"2022-11-24T17:25:54+0000",
                "links":{
                    "alternate":{
                        "href":"shc/upgrade"
                    }
                },
                "content":{
                    "message":"Upgrade initiated",
                    "status":"succeeded"
                }
            }
        ]
    }
    

    In some cases the request can fail and return an error, for example, if health checks fail or if a rolling upgrade is already running. To troubleshoot the cause of a failure, review the HTTP return codes and check log files for details. The upgrade/shc/upgrade endpoint returns the following HTTP status codes:

    Code Description
    200 Upgrade operation successfully initiated.
    400 Configuration error.
    403
    • An upgrade is already running.
    • Upgrade is not required.
    • The search head cluster is not ready. Wait for the cluster to fully initialize.
    500 Internal Server Error. Something went wrong with the upgrade. Check log files for more information. Possible reasons:
    • The upgrade could not be triggered on a given member.
    501 Attempted to upgrade an unsupported deployment. (Rolling upgrade supports search head clusters, search heads and deployers only.)
    503 KV store is not ready.

    For more troubleshooting information, including relevant log files, see Troubleshoot and recover from automated rolling upgrade failure.

    For endpoint details, see upgrade/shc/upgrade in the REST API Reference Manual.

    Alternatively, on any cluster member, run the splunk rolling-upgrade shc-upgrade command to initiate the automated rolling upgrade.

  3. Monitor the status of the rolling upgrade until all cluster members are sucessfully upgraded. To monitor the rolling upgrade status, send an HTTP GET request to the upgrade/shc/status endpoint. For example:
    curl -u admin:pass -k "https://localhost:8089/services/upgrade/shc/status?output_mode=json"
    

    The response shows the current status of the rolling upgrade, including the upgrade status of the entire cluster, the status of each individual cluster member, and the total number and percentage of members upgraded. For example:

    {
        "updated":"2022-11-24T17:33:28+0000",
        "author":"Splunk",
        "layout":"props",
        "entry":[
            {
                "title":"status",
                "id":"/services/upgrade/shc/status",
                "updated":"2022-11-24T17:33:28+0000",
                "links":{
                    "alternate":{
                        "href":"shc/status"
                    }
                },
                "content":{
                    "message":{
                        "upgrade_status":"completed",
                        "statistics":{
                            "peers_to_upgrade":3,
                            "overall_peers_upgraded":3,
                            "overall_peers_upgraded_percentage":100
                        },
                        "peers":[
                            {
                                "name":"sh2",
                                "status":"upgraded",
                                "last_modified":"Thu Nov 24 17:29:41 2022"
                            },
                            {
                                "name":"sh1",
                                "status":"upgraded",
                                "last_modified":"Thu Nov 24 17:28:07 2022"
                            },
                            {
                                "name":"sh3",
                                "status":"upgraded",
                                "last_modified":"Thu Nov 24 17:31:15 2022"
                            }
                        ]
                    }
                }
            }
        ]
    }
    

    The upgrade/shc/status endpoint returns the following HTTP status codes:

    Code Description
    200 Unable to get the latest SHC status.
    400 Configuration error.
    500 Internal error. Check log files for more information on the error.
    501 Attempted to get the status of an unsupported deployment.
    503 Unable to access KV store. KV store probably still initializing.

    For endpoint details, see upgrade/shc/status in the REST API Reference Manual.

    Alternatively, run the splunk rolling-upgrade shc-status command to monitor the automated rolling upgrade.

    When monitoring the rolling upgrade status, if you get a "Couldn't connect to server" response, such as the following:

    % curl -u admin:pass -k https://10.225.218.144:8089/services/shc/status    
    
    curl: (7) Failed to connect to 10.225.218.144 port 8089 after 1212 ms: Couldn't connect to server
    

    it means that particular cluster member is in the process of being restarted as a part of the upgrade process. This can occur when you try to monitor the status of a machine that is temporarily down because the rolling upgrade process stops, unpacks the package, and restarts splunkd. In this case, you can monitor the status from a different cluster member, or wait until that cluster member is up and running again.

  4. Upgrade the deployer. When the upgrade/shc/status endpoint response shows "upgrade_status":"completed" for the entire cluster, repeat step 2 to upgrade the deployer.

Create a custom installation hook

An installation hook is a custom binary or script that installs the Splunk package on every machine. The splunk-rolling-upgrade app downloads the package specified in package_path in rolling_upgrade.conf, then sends a request to the hook to install the package on the cluster member.

The app passes the package path to the hook as the first parameter, and $SPLUNK_HOME as the second parameter. The hook must contain installation instructions for the package, and must have executable permissions, which you can set using the chmod+x command. For example, the following shows the default installation hook for .tgz packages:

#!/bin/bash
set -e
splunk_tar="$1"
dest_dir="$2"
tar -zvxf "$splunk_tar" --strip-components 1 -C "$dest_dir"

Custom hooks for deb and rpm package installation

To perform an automated rolling upgrade using deb or rpm packages, you must create a custom installation hook.

Installation of deb and rpm packages requires sudo permissions, while the Splunk instance typically runs under 'splunk' user without those privileges. This means that the installation hook for deb and rpm packages must acquire elevated privileges before running installation commands, such as sudo rpm --upgrade, in the case of rpm packages.

Before you can install deb or rpm packages, you must install the correct package manager on your machine. dpkg is the package manager for deb packages, and rpm is the package manager for rpm packages.

Troubleshoot and recover from automated rolling upgrade failure

The splunk-rolling-upgrade app provides recovery functionality that lets you return a search head cluster to a ready state, where you can run the automated rolling upgrade again, after a rolling upgrade failure. Before you initiate the recovery process, make sure that the rolling upgrade has actually failed or crashed.

In most cases, when a rolling upgrade failure occurs, the app sets the "upgrade_status" field to "failed" in the upgrade/shc/status endpoint response. In some cases, however, the "upgrade_status" field can show "in_progress" when the rolling upgrade has actually failed. This false "in_progress" response can happen for example if the upgrade crashes while the Splunk instance is stopped.

To investigate the cause of the rolling upgrade failure, find the last instance being upgraded at the time of failure, and check the logs for errors. To find the last instance, look in the upgrade/shc/status endpoint response for the member that has a "status" field not set to either "READY" or "UPGRADED".

The splunk-rolling-upgrade app writes to 3 log files under splunk/var/log/splunk:

  • splunk_shc_upgrade_upgrader_script.log
  • splunk_shc_upgrade_rest_endpoints.log
  • splunk_shc_upgrade_completion_script.log

If the request response shows "no_upgrade", look for errors in the splunk_shc_upgrade_rest_endpoints.log file on the member where you ran the request. Address the issues you find in the logs and make sure the issues do not repeat on other cluster members during future rolling upgrade attempts.

After you address the issues that caused the failure, prepare the cluster for another rolling upgrade attempt, as follows:

  1. If the cluster member where the issue occurred is down, manually perform the installation of the package on that machine. Remove splunk/var/run/splunk/trigger-rolling-upgrade (if it exists), and start Splunk on that member.
  2. Send an HTTP GET request to the upgrade/shc/recovery endpoint. For example:
    curl -u admin:pass -k "https://localhost:8089/services/upgrade/shc/recovery"
    

    This operation returns the cluster to the ready state, where you can run the automated rolling upgrade again after failure. It also sets the current upgrade status to "failed". Note that it can take some time for the KV store to initialize after startup.

    The upgrade/shc/recovery endpoint returns the following HTTP status codes:

    Code Description
    200 Recovery was executed successfully.
    400 Configuration error.
    500 Internal error. Check log files for more information on the error.
    501 Attempted to run a recovery on an unsupported deployment.

    For endpoint details, see upgrade/shc/recovery in the REST API Reference Manual.

    Alternatively, run the splunk rolling-upgrade shc-recovery command to initiate the recovery process.

  3. If the upgrade/shc/recovery endpoint response contains a message such as the following:
    {
        "message":"SHC partially recovered. Please turn off manual detention mode on the following peers: ['sh1']",
        "status":"succeeded"
    }
    

    then send an HTTP POST request to the /shcluster/member/control/control/set_manual_detention endpoint, turning off manual detention on the search head specified in the response. For example:

    curl -u admin:pass -k "https://localhost:8089/servicesNS/admin/search/shcluster/member/control/control/set_manual_detention -d manual_detention=off"
    

    For endpoint details, see shcluster/member/control/control/set_manual_detention in the REST API Reference Manual.

  4. Resume the upgrade by sending an HTTP POST request to the upgrade/shc/upgrade endpoint. For example:
    curl -X POST -u admin:pass -k "https://localhost:8089/services/upgrade/shc/upgrade?output_mode=json"
    
  5. For details on how to run the automated rolling upgrade, see Run the automated rolling upgrade.

Last modified on 11 January, 2024
PREVIOUS
Perform a rolling upgrade of a search head cluster
  NEXT
Configure the search head cluster

This documentation applies to the following versions of Splunk® Enterprise: 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.2.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters