Failover to a standby Splunk UBA system

Perform the following tasks to failover Splunk UBA to the standby system. Make sure the standby system has been properly configured for warm standby. See Set up the backup Splunk UBA deployment for warm standby for instructions.

Before failing over

Perform the following tasks before performing the failover:

If the primary Splunk UBA system is still up and running, make a note of a few key metrics such as the total number of anomalies or total number of data sources. After you perform the failover, you can compare these metrics on the new primary Splunk UBA system to verify that the failover was successful.
By default, Splunk UBA can go back four hours to ingest data from data sources that are stopped and restarted. If the amount of time between when then primary system goes down and the failover to the backup system occurs is greater than four hours, adjust the connector.splunk.max.backtrace.time.in.hour property in the /etc/caspida/local/conf/uba-site.properties file. Perform the following tasks:
1. Log in to the management node on the standby Splunk UBA system as the caspida user.
2. Edit the /etc/caspida/local/conf/uba-site.properties file.
3. Add or edit the connector.splunk.max.backtrace.time.in.hour property. For example, if the primary system went down at 11PM on Friday and the failover was performed at 8AM on Monday, set the property to 57 hours or more to ingest data from the time that the primary system went down. See Time-based search for more information about configuring this property.
4. Synchronize the cluster in distributed deployments:
```
/opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf
```
5. Use the health monitor to check the lag in your data sources by monitoring the DS_LAGGING_WARN property.
6. When the data source lag returns to normal and you are no longer getting warning messages, return the connector.splunk.max.backtrace.time.in.hour property to its default value.

Run the failover command

Perform the following tasks to run the failover command and fail over to the standby Splunk UBA system:

Log in to the management node on the standby Splunk UBA system as the caspida user.
Run the failover command:
```
/opt/caspida/bin/replication/failover
```
This command promotes the standby system to be the primary Splunk UBA system.
Check and verify that the uiServer.host property in the /etc/caspida/local/conf/uba-site.properties file in the standby system matches the setting in the primary system. Depending on whether there is a proxy or DNS server between Splunk UBA and Splunk Enterprise Security (ES), this property may be changed during the failover operation. See Specify the host name of your Splunk UBA server in Install and Configure Splunk User Behavior Analytics for instructions.
If needed, edit the data sources to point to a Splunk search head with a different host name than before:
1. In Splunk UBA, select Manage > Data Sources.
2. Edit the data source for which you need to change the host name.
3. Change the URL to have the name or IP address of the new host.
4. Navigate through the wizard and change any other information as desired.
5. Click OK. A new job for this data source will be started.
If needed, edit the Splunk ES output connector to update the URL:
1. In Splunk UBA, select Manage > Output Connectors.
2. Click the Splunk ES output connector and update the URL.
3. Click OK. This will automatically trigger a one-time sync with Splunk ES.

Verify the failover operation

Perform the following tasks to verify that the failover operation was successful:

Log in to the Splunk UBA web interface on the standby system that you failed over to.
Verify the metrics such as the total number of anomalies or total data sources from the original primary system and make sure they match.
Log in to the CLI of the Splunk UBA system that you failed over to.
Run a tail against /var/log/caspida/replication/failover.log and look for the following message:
```
Failover successfully. This system is promoted to primary system now.
```

Recreate the standby property file if the failover cannot be completed

If you encounter any issues during the failover process that cause the failover to stop or fail, you must recreate the /opt/caspida/conf/replication/properties/standby file before you try the failover again. Perform the following tasks when a failover operation can't be completed:

Identify and fix the issue causing the failover.
Run the following command to recreate the standby property file:
```
touch /opt/caspida/conf/replication/properties/standby
```
Run the failover command again.

Options for restoring the Splunk UBA system that became unavailable

After the failover, the standby Splunk UBA system (System B) will be running as an independent system without HA/DR configured. After you restore the original Splunk UBA deployment that went down, you have the following options:

Setup System A as the standby system for System B. See Set up the standby Splunk UBA system.
Return System A to be the primary system, with System B running as the standby system. You must perform all of the following tasks in order:
1. Setup System A as the standby system for System B. See Set up the standby Splunk UBA system.
2. Failover from System B back to System A. See Failover to a standby Splunk UBA system.
3. Setup System B as the standby system for System A. See Set up the standby Splunk UBA system.

Failover to a standby Splunk UBA system

Before failing over

Run the failover command

Verify the failover operation

Recreate the standby property file if the failover cannot be completed

Options for restoring the Splunk UBA system that became unavailable

Comments

Failover to a standby Splunk UBA system

Was this topic useful?