Backup and restore Splunk UBA using automated incremental backups

Automatic incremental backup and restore is a beta feature and must be implemented with the assistance of Splunk Support.

Attach an additional disk to the Splunk UBA management node in your deployment and configure automated incremental backups.

Periodic incremental backups are performed without stopping Splunk UBA. You can configure the frequency of these backups by configuring the cron job in /opt/caspida/conf/jobconf/caspida-jobs.json.
A weekly full backup is performed without stopping Splunk UBA. You can configure the frequency of these backups using the backup.filesystem.full.interval property.

You can use incremental backup and restore as an HA/DR solution that is less resource-intensive than the warm standby solution described in Configure warm standby in Splunk UBA.

Configure incremental backups in Splunk UBA

Perform the following steps to configure incremental backups of your Splunk UBA deployment:

On the Splunk UBA management node, attach an additional disk dedicated for filesystem backup. For example, mount a device on a local directory on the Splunk UBA management node.
In 20-node clusters, Postgres services run on node 2 instead of node 1. You will need two additional disks - one on node 1 and a second on node 2 for the Postgres services, or you may have a shared storage device that can be accessed by both nodes. Make sure that the backup folder on both nodes is the same. By default, the backups are written to the /backup folder.
Stop Splunk UBA.
```
/opt/caspida/bin/Caspida stop
```
Create a dedicated directory on the management node and change directory permissions so that backup files can be written into the directory. If warm standby is also configured, perform these tasks on the management node in the primary cluster.
```
sudo mkdir /backup
sudo chmod 777 /backup
```

Mount the dedicated device on the backup directory. For example, a new 5TB hard drive mounted on the backup directory:

caspida@node1:~$ df -h /dev/sdc
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc        5.0T  1G    4.9T   1% /backup

Add the following properties into /etc/caspida/local/conf/uba-site.properties:
```
backup.filesystem.enabled=true
backup.filesystem.directory.restore=/backup
```
Synchronize the configuration across the cluster:
```
/opt/caspida/bin/Caspida sync-cluster
```
Register filesystem backup:
```
/opt/caspida/bin/replication/setup filesystem
```
If the same host has been registered before, run the command again with the reset flag:
```
/opt/caspida/bin/replication/setup filesystem -r
```
Enable Postgres archiving.
1. Create the directory where archives will be stored. For example, /backup/wal_archive:
```
mkdir /backup/wal_archive
chown postgres:postgres /backup/wal_archive
```
2. Create a file called archiving.conf on the PostgreSQL node (node 2 for 20-node deployments, node 1 for all other deployments). On RHEL, Oracle Linux, and CentOS systems:
```
cd /var/vcap/store/pgsql/10/data/conf.d/
sudo mv archiving.conf.sample archiving.conf
```
  On Ubuntu systems:
```
cd /etc/postgresql/10/main/conf.d/
sudo mv archiving.conf.sample archiving.conf
```
  If your archive directory is not /backup/wal_archive, edit archiving.conf to change the archive directory.
3. Restart PostgreSQL services on the master node:
```
/opt/caspida/bin/Caspida stop-postgres
/opt/caspida/bin/Caspida start-postgres
```

On the management node:

In the primary cluster, enable the replication system job by adding the ReplicationCoordinator property into the /etc/caspida/local/conf/caspida-jobs.json file. Below is a sample of the file before adding the property:

/**
 * Copyright 2014 - Splunk Inc., All rights reserved.
 * This is Caspida proprietary and confidential material and its use
 * is subject to license terms.
 */
{
  "systemJobs": [
    {
      // "name" : "ThreatComputation",
      // "cronExpr"   : "0 0 0/1 * * ?",
      // "jobArguments" : { "env:CASPIDA_JVM_OPTS" :  "-Xmx4096M" }
    }
  ]
}

After adding the property, the file should look like this:

/**
 * Copyright 2014 - Splunk Inc., All rights reserved.
 * This is Caspida proprietary and confidential material and its use
 * is subject to license terms.
 */
{
  "systemJobs": [
    {
      // "name" : "ThreatComputation",
      // "cronExpr"   : "0 0 0/1 * * ?",
      // "jobArguments" : { "env:CASPIDA_JVM_OPTS" :  "-Xmx4096M" }
    },
    {
      "name"         : "ReplicationCoordinator",
      "enabled"      : true
    }
  ]
}

Run the following command to synchronize the cluster:
```
/opt/caspida/bin/Caspida sync-cluster
```

Start Splunk UBA:
```
/opt/caspida/bin/Caspida start
```

An initial full backup is triggered automatically when next scheduled job starts, as defined by the ReplicationCoordinator property in the /opt/caspida/conf/jobconf/caspida-jobs.json file. After the initial full backup, a series of incremental backups is performed until the next scheduled full backup. By default, Splunk UBA performs a full backup every 7 days. To change this interval, perform the following tasks:

Log in to the Splunk UBA master node as the caspida user.
Edit the backup.filesystem.full.interval property in /etc/caspida/local/conf/uba-site.properties.

Synchronize the cluster.

/opt/caspida/bin/Caspida sync-cluster  /etc/caspida/local/conf

Perform periodic cleanup of the backup files

When a new full backup is completed, it is located in the caspida directory. All existing directories in the caspida directory are moved to delete directory. You can safely remove all content in the delete directory to help minimize the number of files retained on the system, while also preserving recovery capability to the latest checkpoint.

In the following example, it is safe to remove all backup directories 0000021 to 0000038 in /backup/delete/, while keeping 1000039 to 0000045 in /backup/caspida/. The 1000039 folder contains a full backup, while all the other directories starting with zero contain incremental backups.

caspida@node1:~$ ls -t /backup/caspida/ /backup/delete/
/backup/caspida/:
0000045  0000044  0000043  0000042  0000041  0000040  1000039
 
/backup/delete/:
0000038  0000036  1000034  0000032  0000030  0000028  0000026  0000024  0000022  1000020
0000037  0000035  0000033  0000031  0000029  0000027  0000025  0000023  0000021

Restore Splunk UBA from incremental backups

To restore Splunk UBA from online incremental backup files, at least one base backup directory containing a full backup must exist.

A base directory has a sequence number starting with 1.
An incremental directory has a sequence number starting with 0.

For example. if a filesystem backup is setup with BACKUP_HOME as /backup/caspida, the following is a valid base directory with three incremental directories:

caspida@node1:~$ du -sh /backup/caspida/*
1.5G    /backup/caspida/0000124
1.5G    /backup/caspida/0000125
1.4G    /backup/caspida/0000126
35G /backup/caspida/1000123

The base directory 1000123 contains 35GB worth of backup files, while the incremental directories 0000124, 0000125 and 0000126 have only backup files around 1.5GB for each.

A restore can be performed in the following scenarios:

From a base directory with all incremental directories. Using our example, this includes all of 1000123, 0000124, 0000125, and 0000126 so Splunk UBA is restored to the latest checkpoint.
From a base directory with some incremental directory with contiguous sequences. Using our example, we can use 1000123, 0000124 and 0000125. The 1000123 and 0000125 directories cannot be used without 0000124 as it skips the sequence number.
From a base directory only, such as 1000123 in our example.

Steps to restore

This example shows how to restore from a base directory 1000123 with all of the incremental directories 0000124, 0000125, and 0000126.

Get the server ready to restore (which can be either the original server or a separate one). If there is any existing data, run:
```
/opt/caspida/bin/CaspidaCleanup
```
Stop all services:
```
/opt/caspida/bin/Caspida stop-all
```
Restore Postgres.
1. On the Postgres node (node 2 in 20-node deployments, node 1 in all other deployments), clean up any existing data. Run the following command on RHEL, OEL, or CentOS systems:
```
sudo rm -rf /var/lib/pgsql/10/main/*
```
  On Ubuntu systems, run the following command:
```
sudo rm -rf /var/lib/postgresql/10/main/*
```
2. Copy all content under <base directory>/postgres/base to the Postgres node. For example, if you. are copying from different server on a RHEL, OEL, or CentOS system, use teh following command:
```
scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/pgsql/10/main
```
  On Ubuntu systems, use the following command:
```
scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/postgresql/10/main
```
3. (Skip this step for only base backup directory without any incremental)
  Remove unnecessary WAL files. On RHEL, OEL, or CentOS systems, run the following command:
```
rm -rf /var/lib/pgsql/10/main/pg_wal/*
```
  On Ubuntu systems, run the following command:
```
rm -rf /var/lib/postgresql/10/main/pg_wal/*
```
  Make sure the system has access to Postgres WAL archive directory. Modify the /var/lib/pgsql/10/main/recovery.conf (on RHEL, OEL, and CentOS systems) or /var/lib/postgresql/10/main/recovery.conf (on Ubuntu systems) file and remove all contents in the file, then add the following properties:
```
restore_command = 'cp <WAL directory>/%f "%p"'
recovery_target_time = '<recovery timestamp>'
recovery_target_action = 'promote'
```
  Where <WAL directory> is the directory with all Postgres WAL files, and <recovery timestamp> is the timestamp in backup file <BACKUP_HOME>/0000126/postgres/recovery_target_time.
  For example, the recovery.conf file looks like this:
```
restore_command = 'cp /backup/wal_archive/%f "%p"'
recovery_target_time = '2019-09-16 12:36:03'
recovery_target_action = 'promote'
```
4. Change ownership of the backup files. On RHEL, OEL, or CnetOS systems, run the following command:
```
chown -R postgres:postgres /var/lib/pgsql/10/main
```
  On Ubuntu systens, run the following command:
```
chown -R postgres:postgres /var/lib/postgresql/10/main
```
5. Start Postgres service by running this on master node:
```
/opt/caspida/bin/Caspida start-postgres
```
  Monitor Postgres logs under /var/log/postgres, which show the recovering process. Once the recovery completes, query Postgres to see if data is recovered. For example, run the following command from the Postgres CLI:
```
psql -d caspidadb -c 'SELECT * FROM dbinfo'
```
6. After the Postgres service is updated, run the following command on the node where Postgres is running. On RHEL, OEL, or CentOS systems, run the following command:
```
sudo -u postgres /usr/lib/pgsql/10/bin/pg_ctl promote -D /var/lib/postgresql/10/main
```
  On Ubuntu systems, run the following command:
```
sudo -u postgres /usr/lib/postgresql/10/bin/pg_ctl promote -D /var/lib/postgresql/10/main
```
  View your /etc/caspida/local/conf/caspida-deployment.conf file to see where Postgres is running on in your deployment.
Restore Redis. Redis backups are full backups, even for incremental Splunk UBA backups. You can restore Redis from any backup directory, such as the most recent incremental backup directory. In our example, we can backup Redis from the 0000126 incremental backup directory. The Redis backup file ends with the node number. Be sure to restore the backup file on the correct corresponding node. For example, in a 5-node cluster, the Redis file must be restored on nodes 4 and 5. Assuming the backup files are on node 1, run the following command on node 4 to restore Redis:
```
scp caspida@node1:<BACKUP_HOME>/0000126/redis/redis-server.rdb.4 /var/vcap/store/redis/redis-server.rdb
```
Similarly, run the following command on node 5:
```
scp caspida@node1:<BACKUP_HOME>/0000126/redis/redis-server.rdb.5 /var/vcap/store/redis/redis-server.rdb
```
View your /etc/caspida/local/conf/caspida-deployment.conf file to see where Redis is running on in your deployment.
Restore InfluxDB. Similar to Redis, InfluxDB backups are full backups. You can restore InfluxDB from the most recent backup directory. In this example, InfluxDB is restored from the 0000126 incremental backup directory. On the management node, which hosts InfluxDB, start InfluxDB, clean it up, and restore from backup files:
```
sudo service influxdb start
influx -execute "DROP DATABASE caspida"
influx -execute "DROP DATABASE ubaMonitor"
influxd restore -portable <BACKUP_HOME>/0000126/influx
```
Restore HDFS. To restore HDFS, we need to first restore base, and incremental data in continues sequence. In our example, we first restore from 1000123, then 0000124, 0000125 and 0000126.
1. Start the necessary services. On the management node, run the following command:
```
/opt/caspida/bin/Caspida start-all --no-caspida
```
2. Restore HDFS from the base backup directory:
```
nohup hadoop fs -copyFromLocal <BACKUP_HOME>/1000123/hdfs/caspida /user &
```
  Restoring HDFS can take a long time. Check the process ID to see if the restore is completed. For example if the PID is 111222, check by using the following command:
```
ps 111222
```
3. Repeat the previous step for the incremental backup directories 0000124, 0000125, and 0000126.
```
nohup hadoop fs -copyFromLocal <BACKUP_HOME>/0000124/hdfs/caspida /user &
nohup hadoop fs -copyFromLocal <BACKUP_HOME>/0000125/hdfs/caspida /user &
nohup hadoop fs -copyFromLocal <BACKUP_HOME>/0000126/hdfs/caspida /user &
```
  If you have a large number of incremental backup directories, you can write a script containing all the commands, then run the script.
4. Change owner in HDFS:
```
sudo -u hdfs hdfs dfs -chown -R impala:caspida /user/caspida/analytics
sudo -u hdfs hdfs dfs -chown -R mapred:hadoop /user/history
sudo -u hdfs hdfs dfs -chown -R impala:impala /user/hive
sudo -u hdfs hdfs dfs -chown -R yarn:yarn /user/yarn
```
5. If the server you are restoring to is different from the one where the backup was taken, run the following commands to update the metadata:
```
hive --service metatool -updateLocation hdfs://<RESTORE_HOST>:8020 hdfs://<BACKUP_HOST>:8020
impala-shell -q "INVALIDATE METADATA"
```
  Note the host is node1 in deployment file.

Restore your rules and customized configurations from the latest backup directory:

Restore the configurations:

cp -pr <BACKUP_HOME>/0000126/conf/* /etc/caspida/local/conf/

Restore the rules:

rm -Rf /opt/caspida/conf/rules/*
cp -prf <BACKUP_HOME>/0000126/rule/* /opt/caspida/conf/rules/

Start the server:

/opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf
/opt/caspida/bin/CaspidaCleanup container-grouping
/opt/caspida/bin/Caspida start

Check the Splunk UBA web UI to make sure the server is operational.

If the server for backup and restore are different, perform the following tasks:

Update the data source metadata:

curl -X PUT -Ssk -v -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" https://localhost:9002/datasources/moveDS?name=<DS_NAME>

Replace <DS_NAME> with the data source name displayed in Splunk UBA.

Trigger a one-time sync with Splunk ES: If your Splunk ES host did not change, run the following command:

curl -X POST 'https://localhost:9002/jobs/trigger?name=EntityScoreUpdateExecutor' -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" -H 'Content-Type: application/json' -d '{"schedule": false}' -k

If you are pointing to a different Splunk ES host, edit the host in Splunk UBA to automatically trigger a one-time sync.

Backup and restore Splunk UBA using automated incremental backups

Configure incremental backups in Splunk UBA

Perform periodic cleanup of the backup files

Restore Splunk UBA from incremental backups

Steps to restore

Comments

Backup and restore Splunk UBA using automated incremental backups

Was this topic useful?