Set up the standby Splunk UBA system
The standby Splunk UBA system should run in read-only mode. Do not start any Splunk UBA services in the standby system. If you do, the PostgreSQL logs can fill up and negatively affect performance. See Clean up the standby system if you accidentally started Splunk UBA services.
After meeting the requirements, perform the following tasks to deploy and set up a secondary Splunk UBA system as the read-only warm standby system. See, Requirements to set up warm standby for Splunk UBA:
- (Optional) If the standby system has existing data, run the following command to clean up the system:
/opt/caspida/bin/CaspidaCleanup
- Run the following command on the management node of both the primary and standby systems:
/opt/caspida/bin/Caspida stop
- Add the following deployment properties to
/opt/caspida/conf/deployment/caspida-deployment.conf
on both the primary and and standby systems:- On the primary system, uncomment
caspida.cluster.replication.nodes
and add standby system nodes. For example, if for 3-node deployment of host s1, s2 and s3, add:caspida.cluster.replication.nodes=s1,s2,s3
In AWS environments, add the private IP addresses of each node. - On the standby system, uncomment
caspida.cluster.replication.nodes
and add the primary system nodes. For example, if for 3-node deployment of host p1, p2 and p3, add:caspida.cluster.replication.nodes=p1,p2,p3
In AWS environments, add the private IP addresses of each node.
The host names or IP addresses of the nodes on the primary and standby systems do not need to be the same, as long as they are all defined in
caspida-deployment.conf
as shown in this example. - Run
sync-cluster
on the management node on both the primary and standby systems:/opt/caspida/bin/Caspida sync-cluster
- On the primary system, uncomment
- Allow traffic across the primary and standby systems:
- Setup inter-cluster passwordless SSH communication across all nodes of primary and standby systems. See Setup passwordless communication between the UBA nodes in Install and Upgrade Splunk User Behavior Analytics.
- Set up firewalls by running the following commands on the management node on both the primary and standby systems:
/opt/caspida/bin/Caspida disablefirewall-cluster /opt/caspida/bin/Caspida setupfirewall-cluster /opt/caspida/bin/Caspida enablefirewall-cluster
- Register and enable replication.
- On both primary and standby systems, add the following properties into
/etc/caspida/local/conf/uba-site.properties
on the management node. If thereplication.enabled
property already exists, make sure it is set totrue
.replication.enabled=true replication.primary.host=<management node of primary cluster> replication.standby.host=<management node of standby cluster>
- In the primary cluster, enable the replication system job by adding the
ReplicationCoordinator
property into/etc/caspida/local/conf/caspida-jobs.json
file on the management node. TheReplicationCoordinator
must be set totrue
. Below is a sample of the file before adding the property:/** * Copyright 2014 - Splunk Inc., All rights reserved. * This is Caspida proprietary and confidential material and its use * is subject to license terms. */ { "systemJobs": [ { // "name" : "ThreatComputation", // "cronExpr" : "0 0 0/1 * * ?", // "jobArguments" : { "env:CASPIDA_JVM_OPTS" : "-Xmx4096M" } } ] }
After adding the property, the file should look like this:
/** * Copyright 2014 - Splunk Inc., All rights reserved. * This is Caspida proprietary and confidential material and its use * is subject to license terms. */ { "systemJobs": [ { // "name" : "ThreatComputation", // "cronExpr" : "0 0 0/1 * * ?", // "jobArguments" : { "env:CASPIDA_JVM_OPTS" : "-Xmx4096M" } }, { "name" : "ReplicationCoordinator", "enabled" : true } ] }
- On both the primary and standby systems, run
sync-cluster
on the management node to synchronize the configuration changes:/opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf/
- On both primary and standby systems, add the following properties into
- On management node of the primary system, run the following command:
/opt/caspida/bin/replication/setup -d standby -m primary
If the same node has been registered before, a reset is necessary. Run the command again with the reset option:
/opt/caspida/bin/replication/setup -d standby -m primary -r
- On management node of standby system, run the following command:
/opt/caspida/bin/replication/setup -d standby -m standby
- If the standby system is running RHEL or Oracle Linux operating systems, run the following command to create a directory on each node in the cluster:
sudo mkdir -m a=rwx /var/vcap/sys/run/caspida
- Start Splunk UBA on the primary system by running the following command on the management node:
/opt/caspida/bin/Caspida start
- Start Splunk UBA without Caspida services on the standby system by running the following command on the management node:
/opt/caspida/bin/Caspida start-all --no-caspida
After you perform these steps, the primary and standby systems are not synchronized until 4 hours later. You can synchronize both systems using a curl command. See Synchronize the primary and standby systems on-demand for instructions.
Verify that the standby system is set up correctly
You can verify your setup by viewing the table in the Postgres database that tracks the status of the sync between the primary and standby systems.
In all deployments smaller than 20 nodes, run the following command on the management node to check. In 20-node clusters, run the command on node 2:
psql -d caspidadb -c 'select * from replication'
Below is an example output for a 3-node cluster:
caspida@ubanode001$ psql -d caspidadb -c "select * from replication" id | host | type | status | message | modtime | cycleid | cyclestatus ----+------------+---------+----------+-------------------------------------+----------------------------+---------+------------- 25 | ubanode01 | Standby | Inactive | | 2021-09-01 23:29:01.572365 | 0 | 26 | ubanode001 | Primary | Active | nodes:ubanode001,ubanode002,ubanode003 | 2021-09-01 23:29:01.636719 | 0000000 | (2 rows) caspida@ubanode001$
In this example, the primary system ubanode001 has a status of Active, and the standby system ubanode01 has a status of Inactive. The Inactive status means that a sync between the primary and standby systems has not occurred yet.
Synchronize the primary and standby systems on-demand
To trigger a full sync right away, use the following command on the management node of the primary system:
curl -X POST -k -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" https://localhost:9002/jobs/trigger?name=ReplicationCoordinator
View the /var/log/caspida/replication/replication.log
file on the management node of the primary system for additional information about the progress and status of the sync.
Verify that the primary and standby systems are synchronized
You can verify your setup and that the initial sync has started by viewing the table in the Postgres database that tracks the status of the sync between the primary and standby systems.
In all deployments smaller than 20 nodes, run the following command on the management node to check. In 20-node clusters, run the command on node 2:
psql -d caspidadb -c 'select * from replication'
Below is an example output for a 3-node cluster:
caspida@ubanode001$ psql -d caspidadb -c 'select * from replication' id | host | type | status | message | modtime | cycleid | cyclestatus ----+------------+---------+--------+-----------------------------------------+----------------------------+---------+------------- 25 | ubanode01 | Standby | Active | | 2021-04-04 04:10:57.191609 | 0000002 | 26 | ubanode001 | Primary | Active | nodes:ubanode001,ubanode002,ubanode003 | 2021-04-04 04:10:57.212676 | 0000002 | (2 rows)
After the setup is completed, the status of the standby system is Inactive. After the first sync cycle is completed, the status is Active. If the initial sync fails, Splunk UBA will retry to sync every four hours. After four failures, the status of the standby system is Dead and replication is not attempted again until the issue is resolved.
Manual verification
To manually verify that the datastores, such as Postgres, HDFS, IR Cache, InfluxDB, and Redis are synchronized, run the following health check script on the management nodes of the primary as well as the standby cluster:
/opt/caspida/bin/utils/uba_health_check.sh
Then, verify the following datastore statistics:
Statistic | Path | Action |
---|---|---|
Postgres | health check > backend stats > POSTGRESQL | Check for similar stats between standby and primary |
HDFS | health check > semiaggr_s partitions | Check for similar stats between standby and primary |
IR cache | n/a | Covered by verifying HDFS parity |
InfluxDB | health check > backend stats > INFLUXDB | Check for similar stats between standby and primary |
Redis | health check > backend stats > REDIS | Check for similar stats between standby and primary |
Minor discrepancies between primary and standby stats are to be expected due to active ingest on primary and/or the four hour sync period.
Requirements to set up warm standby for Splunk UBA | How Splunk UBA synchronizes the primary and standby systems |
This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.1.0, 5.1.0.1, 5.2.0, 5.2.1, 5.3.0, 5.4.0, 5.4.1
Feedback submitted, thanks!