Failover to a standby Splunk UBA system
Perform the following tasks to failover Splunk UBA to the standby system. Make sure the standby system has been properly configured for warm standby. See Set up the backup Splunk UBA deployment for warm standby for instructions.
Before failing over
Perform the following tasks before performing the failover:
- If the primary Splunk UBA system is still up and running, make a note of a few key metrics such as the total number of anomalies or total number of data sources. After you perform the failover, you can compare these metrics on the new primary Splunk UBA system to verify that the failover was successful.
- By default, Splunk UBA can go back four hours to ingest data from data sources that are stopped and restarted. If the amount of time between when then primary system goes down and the failover to the backup system occurs is greater than four hours, adjust the
connector.splunk.max.backtrace.time.in.hour
property in the/etc/caspida/local/conf/uba-site.properties
file. Perform the following tasks:- Log in to the management node on the standby Splunk UBA system as the caspida user.
- Edit the
/etc/caspida/local/conf/uba-site.properties
file. - Add or edit the
connector.splunk.max.backtrace.time.in.hour
property. For example, if the primary system went down at 11PM on Friday and the failover was performed at 8AM on Monday, set the property to 57 hours or more to ingest data from the time that the primary system went down. See Time-based search for more information about configuring this property. - Synchronize the cluster in distributed deployments:
/opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf
- Use the health monitor to check the lag in your data sources by monitoring the DS_LAGGING_WARN property.
- When the data source lag returns to normal and you are no longer getting warning messages, return the
connector.splunk.max.backtrace.time.in.hour
property to its default value.
Run the failover command
Perform the following tasks to run the failover
command and fail over to the standby Splunk UBA system:
- Log in to the management node on the standby Splunk UBA system as the caspida user.
- Run the
failover
command:This command promotes the standby system to be the primary Splunk UBA system./opt/caspida/bin/replication/failover
- Check and verify that the
uiServer.host
property in the/etc/caspida/local/conf/uba-site.properties
file in the standby system matches the setting in the primary system. Depending on whether there is a proxy or DNS server between Splunk UBA and Splunk Enterprise Security (ES), this property may be changed during the failover operation. See Specify the host name of your Splunk UBA server in Install and Configure Splunk User Behavior Analytics for instructions. - If needed, edit the data sources to point to a Splunk search head with a different host name than before:
- In Splunk UBA, select Manage > Data Sources.
- Edit the data source for which you need to change the host name.
- Change the URL to have the name or IP address of the new host.
- Navigate through the wizard and change any other information as desired.
- Click OK. A new job for this data source will be started.
- If needed, edit the Splunk ES output connector to update the URL:
- In Splunk UBA, select Manage > Output Connectors.
- Click the Splunk ES output connector and update the URL.
- Click OK. This will automatically trigger a one-time sync with Splunk ES.
Verify the failover operation
Perform the following tasks to verify that the failover operation was successful:
- Log in to the Splunk UBA web interface on the standby system that you failed over to.
- Verify the metrics such as the total number of anomalies or total data sources from the original primary system and make sure they match.
- Log in to the CLI of the Splunk UBA system that you failed over to.
- Run a
tail
against/var/log/caspida/replication/failover.log
and look for the following message:Failover successfully. This system is promoted to primary system now.
Set up single sign-on (SSO) after the failover operation
Perform these steps if you have SSO set up for signing in to UBA.
After verifying the failover operation, update the configuration of the existing SSO app:
- Update the hostname for both the single sign on URL and single sign out URL for the system you want to access after the failover.
- Change the signature certificates in the SSO app configuration for the system you want to access after the failover.
- After performing the failover, if the IdP Certificate is not there, you must get the IdP Certificate from your SSO provider and add it to the
/var/vcap/store/caspida/certs/idpcerts
directory in Splunk UBA. For further details, see Configure authentication using single sign-on.
Recreate the standby property file if the failover cannot be completed
If you encounter any issues during the failover process that cause the failover to stop or fail, you must recreate the /opt/caspida/conf/replication/properties/standby
file before you try the failover again. Perform the following tasks when a failover operation can't be completed:
- Identify and fix the issue causing the failover.
- Run the following command to recreate the
standby
property file:touch /opt/caspida/conf/replication/properties/standby
- Run the
failover
command again.
After the failover, the standby Splunk UBA system (System B) will be running as an independent system without HA/DR configured. After you restore the original Splunk UBA deployment that went down, you have the following options:
- Setup System A as the standby system for System B. See Set up the standby Splunk UBA system.
- Return System A to be the primary system, with System B running as the standby system. You must perform all of the following tasks in order:
- Setup System A as the standby system for System B. See Set up the standby Splunk UBA system.
- Failover from System B back to System A. See Failover to a standby Splunk UBA system.
- Setup System B as the standby system for System A. See Set up the standby Splunk UBA system.
How Splunk UBA synchronizes the primary and standby systems | Change the role of both systems to switch the primary and standby systems |
This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.1.0, 5.1.0.1, 5.2.0, 5.2.1, 5.3.0, 5.4.0, 5.4.1
Feedback submitted, thanks!