Implement cluster manager redundancy

Implementation of an active/standby cluster manager topology requires navigating a sometimes complex and difficult set of procedures customized to achieve your goals in your environment. Involve Splunk Professional Services to ensure that your deployment is successful.

To achieve cluster manager high availability, you can deploy two or more cluster managers in an active/standby configuration. You can configure the managers to support either automatic or manual failover.

The active and standby cluster managers continuously sync the state of the cluster among themselves. This activity ensures that all cluster managers, whether active or standby, have the same configuration bundle, search generation, and peer list, thus ensuring a smooth transition to a new cluster manager when the need arises.

Configuration changes made to the cluster managers themselves (for example, in the cluster managers' copies of server.conf) are not synced automatically between managers. You must make such configuration changes directly on each cluster manager, or by employing some third-party tool that can push the changes to the set of managers.

During automatic failover, the cluster nodes that connect to the manager - peer nodes, search heads, and forwarders (when configured for indexer discovery) - must switch to the new active cluster manager. To support automatic switchover to a new active cluster manager, you can deploy a third-party load balancer between the cluster managers and the peer nodes, search heads, and forwarders.

Similarly, during manual failover, the cluster nodes must switch to the new active manager. To support switchover for a manual failover deployment, you can use DNS mapping.

You can also use DNS mapping with automatic failover. See Use DNS mapping to support cluster manager redundancy.

Deploying a cluster for active/standby high availability thus requires configuration in three areas:

Multiple cluster managers
Peer nodes, search heads, and forwarders (if indexer discovery is enabled)
Third-party load balancer or DNS mapping

System requirements

You must deploy at least two cluster managers. Each manager must reside on its own machine or virtual machine.

Only the cluster managers need to be running a version of Splunk Enterprise that supports cluster manager redundancy. The other cluster nodes can run earlier versions, subject to the general indexer cluster version compatibility requirements, described here: Splunk Enterprise version compatibility.

The set of requirements for third-party load balancers is described separately.

Use a load balancer to support cluster manager redundancy

The third-party load balancer sits in front of the cluster managers and directs traffic from the nodes that talk to the cluster manager -- that is, the peer nodes, search heads, and forwarders, if enabled through indexer discovery. The load balancer ensures that traffic goes only to the currently active cluster manager. If the active manager goes offline or switches to standby, the load balancer redirects traffic to the newly active manager.

The load balancer solution is typically employed with managers configured to use the automatic switchover mode.

Load balancer requirements

A number of third-party load balancers can be used for this purpose, assuming they meet the following requirements:

They must support the REST-based health check API, described in the REST API Reference Manual: cluster/manager/ha_active_status.
They can forward traffic only to the cluster manager that responds 200 to the health check API.
They must provide an "always up" service.

How the load balancer directs traffic

You configure the cluster nodes to connect through the load balancer IP address or hostname, rather than through the IP address or hostname of the cluster manager itself.

The load balancer handles only traffic between the active cluster manager and the cluster nodes. The active and standby cluster managers sync among themselves, without the involvement of the load balancer. The standby managers directly monitor the availability of the active cluster manager and perform an automatic switchover if the active manager fails.

To determine the active cluster manager, the load balancer periodically sends out a health check probe. The active manager responds with 200; the standby managers respond with 503. Based on the health check results, the load balancer determines which manager should receive traffic from the nodes.

Load balancers in a multisite indexer cluster

Multisite indexer clusters use a single active cluster manager that services all sites. For an overview of multisite clusters, see Multisite indexer cluster architecture.

With cluster manager redundancy, the cluster managers are deployed across at least two sites. Only one cluster manager is active at any time; the other managers are in standby mode.

Each site has its own load balancer. The load balancers can be deployed in a variety of topologies to support your environment and goals. These are some examples:

Each load balancer is on a separate subnet and has its own IP address. You can optionally use DNS to unify the manager_uri configuration across all peer nodes on all sites. The disadvantage of this method is that peer nodes on one site can lose access to the cluster manager if that site's load balancer goes down.

The sites employ an extended L2 network. The load balancers have IP addresses in the same subnet by means of the extended L2 network. The load balancers use a next hop redundancy protocol to negotiate for the primary load balancer. All peer nodes across all sites send traffic through the primary load balancer. The disadvantage of this method is that performance can be affected by reliance on a single load balancer.

The load balancers are deployed in a high availability configuration. Each site has its own load balancer, with its own subnet and IP address. The load balancers must sync status for the active cluster manager among themselves, ensuring that all load balancers send their traffic to the active manager. A DNS record on each site includes IP addresses for all of the load balancers, with preference for the local load balancer. This is the most robust solution.

Use DNS mapping to support cluster manager redundancy

With the DNS-based solution, a DNS record manages the connection between the active cluster manager and the nodes. Nodes connect to the cluster manager indirectly, through the DNS record.

The DNS-based solution is typically employed with managers configured to use the manual switchover mode. You can also use DNS with auto switchover mode, but you still must manually update the DNS record.

You need to have external monitoring in place to detect the loss of the active cluster manager:

If the managers are configured for manual switchover, when a loss is detected, you must manually switch one of the standby managers to active status. You must also update the DNS record to point to the new active cluster manager.
If the managers are configured for automatic switchover, when a loss is detected, the system automatically chooses a new active manager, but you must still update the DNS record to point to the new active cluster manager.

You can create a script that detects the loss of the active manager, switches to a new active manager (if using manual switchover mode), and updates the DNS record.

Configure cluster manager redundancy

You must configure the cluster managers, as well as the other cluster nodes (peer nodes, search heads, and forwarders, if indexer discovery is enabled).

Configure the cluster managers

Configure these server.conf settings identically on all cluster managers:

manager_switchover_mode. This setting must be set to either "auto", for automatic failover, or "manual", for manual failover:
- If set to "auto", the managers automatically adjust modes when the need arises. For example, if the active manager goes down, one of the standby managers switches automatically to active mode.
- If set to "manual", you must manually switch one of the standby managers to active if the current active manager goes down.
- The default value is "disabled", which means no cluster manager redundancy.
manager_uri. In this context, manager_uri is a prioritized list of all active and standby cluster managers. When the switchover mode is set to auto, the configured priority determines which manager becomes active if the current active manager goes down. In both auto and manual modes, the priority also determines which manager gets set to active when multiple cluster managers are starting at the same time.
[clustermanager:<cm-nameX>]. Multiple instances of this stanza identify each cluster manager's URI. Each cluster manager's server.conf file must include the set of stanzas for all cluster managers, including itself.
pass4SymmKey. This setting is required for any indexer cluster node. When implementing redundant cluster managers, ensure that the setting is identical across all cluster managers.

In this example, the cluster has three cluster managers. Each manager includes the following group of settings in its server.conf file. The settings must be identical on each manager.

[clustering]
mode = manager
manager_switchover_mode = auto
manager_uri = clustermanager:cm1,clustermanager:cm2,clustermanager:cm3
pass4SymmKey = changeme
 
[clustermanager:cm1]
manager_uri = https://10.16.88.3:8089

[clustermanager:cm2]
manager_uri = https://10.16.88.4:8089

[clustermanager:cm3]
manager_uri = https://10.16.88.5:8089

The order specified by manager_uri indicates the priority of the cluster manager. In this example, cm1, if available upon cluster startup, will be the active cluster manager. If cm1 fails, cm2 takes over as active, with cm3 switching to active only if both cm1 and cm2 are unavailable.

Note the following points regarding manager_uri prioritization when a cluster manager starts:

If there is already an active cluster manager present (irrespective of its priority), the starting cluster manager will take the standby role.
If other cluster managers are starting simultaneously and are of higher priority, the lower priority starting manager will follow the priority list and take the standby role.
If other cluster managers are starting simultaneously and are of lower priority, the higher priority starting cluster manager will take the active role.

Place the active cluster manager into maintenance mode before changing its settings.

You must restart the cluster manager for the settings to take effect.

Configuration changes to the cluster managers are not synced automatically between managers. You must make such configuration changes directly on each cluster manager, or by employing some third-party tool that can push the changes to the set of managers.

Configure peer nodes, search heads, and forwarders

Peer nodes and search heads both use the setting manager_uri in the [clustering] stanza of their server.conf files to identify the cluster manager. In the case of redundant cluster managers, the nodes must reference either the load balancer or DNS record, rather than the cluster manager itself. The load balancer or DNS record then redirects the node traffic to the active manager.

For example, on each peer node or search head, update the manager_uri setting in server.conf, like this:

[clustering]
manager_uri = https://<LB-IP-OR-DNS-HOSTNAME>

To avoid a restart of the peer nodes, implement the change on each peer via REST, rather than through the configuration bundle method. For example:

curl -k -v -u admin:changeme https://<IDX IP>:8089/services/cluster/config/clusterconfig -d 'manager_uri=https://<LB-IP-OR-DNS-HOSTNAME>:8089'

If a search head is searching across multiple indexer clusters, make the appropriate changes within all of their [clustermanager] stanzas that reference a cluster employing cluster manager redundancy. See Search across multiple indexer clusters for general information on configuring search heads to search across multiple clusters.

Forwarders enabled for indexer discovery use manager_uri in the [indexer_discovery:<name>] stanza of their output.conf files to identify the cluster manager. In the case of redundant cluster managers, the forwarders must reference either the load balancer or DNS record:

[indexer_discovery:<name>]
manager_uri = https://LB-OR-DNS:8089

These changes all require a restart of the peer nodes, search heads, and forwarders.

Deployment considerations

When deploying or updating your cluster managers, here are some guidelines:

Update the nodes in your deployment in this order:

Cluster managers: current active manager, followed by new active manager, followed by any remaining standby managers
Load balancer or DNS record
Peer nodes
Search heads and forwarders

For an existing deployment, put the current active manager in maintenance mode and switch it to standby before performing other updates to its configuration.

Manage cluster manager redundancy

Use the CLI

Run the following command from any cluster manager to view the cluster manager redundancy status for all managers in the cluster:

splunk cluster-manager-redundancy -show-status -auth <user:passwd>

To change the active cluster manager, run the following command on the manager that you want to switch to active mode:

splunk cluster-manager-redundancy -switch-mode active

The formerly active manager will automatically restart in standby mode. The newly active manager does not require a restart.

Use the REST API

To view cluster manager redundancy status for all managers in the cluster, initiate a GET for this endpoint from any of the managers:

services/cluster/manager/redundancy/

To change the active cluster manager, initiate a POST for this endpoint on the manager that you want to switch to active mode:

services/cluster/manager/redundancy/
/services/cluster/manager/redundancy/ -d "_action=switch_mode" -d "ha_mode=Active"

The formerly active manager will automatically restart in standby mode. The newly active manager does not require a restart.

Use the manager node dashboard

You can use the manager node dashboard to manage cluster manager redundancy. The dashboard is available on active and standby managers through Settings > Indexer Clustering, as described in Configure the manager node with the dashboard.

If the cluster manager is in active mode, you will see the usual set of tabs for peers, indexes, and search heads, along with a new tab, "Cluster Manager". Click on it, and you'll see a table with a row for each cluster manager, indicating its HA mode (active/standby), as well as some other basic information.

The "Edit" button is available with the usual set of capabilities, along with a "Switch HA Mode" button to switch the manager to standby. When you switch a manager from active to standby, it restarts automatically. If the managers' switchover mode is set to active, the standby manager with the highest priority will then automatically switch to active.

If the cluster manager is in standby mode, you will see a statement at the top of the dashboard, "This cluster is in standby mode". Instead of the full set of tabs that you see for an active manager, the standby manager's dashboard contains only a table with rows for each of the cluster managers. The "Edit" button is disabled, but the "Switch HA Mode" button is available to switch the manager to active. When a standby manager becomes the active manager, the formerly active manager will then restart in standby mode.

Update the peer nodes' configuration bundle

The process of updating peer nodes works the same as with non-redundant clusters, aside from a few minor issues. For a general discussion of updating peer nodes, see Manage common configurations across all peers.

To add or update the configuration bundle for the peer nodes, place the changes in the active cluster manager's manager-apps directory. To distribute the bundle to the peer nodes, apply the bundle as usual from the active manager.

You must put the configuration bundle updates in the manager-apps directory on the active cluster manager. The updates will not be synchronized to the standby cluster managers until they have been applied and distributed to the peer nodes via the active manager, and the new bundle becomes the active bundle for the cluster.

Do not switch the active cluster manager until any pending configuration bundle changes have been successfully applied to the peer nodes.

Related answers from Splunk Community

Implement cluster manager redundancy

System requirements

Use a load balancer to support cluster manager redundancy

Load balancer requirements

How the load balancer directs traffic

Load balancers in a multisite indexer cluster

Use DNS mapping to support cluster manager redundancy

Configure cluster manager redundancy

Configure the cluster managers

Configure peer nodes, search heads, and forwarders

Deployment considerations

Manage cluster manager redundancy

Use the CLI

Use the REST API

Use the manager node dashboard

Update the peer nodes' configuration bundle

Comments

Implement cluster manager redundancy

Was this topic useful?