Restore Splunk UBA from a full backup
On December 31, 2021, Red Hat's CentOS Linux reached End Of Life (EOL). Per Red Hat, Inc, CentOS Linux users must migrate to a new operating system to continue receiving updates, patches, and new features. Red Hat also encourages customers to migrate to RHEL. Additionally, Red Hat made the new "CentOS Stream" operating system a non-production, pre-build version of RHEL, with no long-term support model. Splunk UBA does not include CentOS Stream as a supported operating system. Customers must migrate to and adopt a supported production Linux distro of RHEL, Ubuntu, or OEL as a base OS for UBA version 5.1.0.
This example shows how to restore from a full backup, using the base directory 1000123
without any accompanying incremental directories.
Do not use this procedure to restore Splunk UBA using a backup file that was created by running the uba-backup.sh script. Only use the full backup file generated by the automated incremental backup.
- Prepare the server for the restore operation. If there is any existing data, run:
/opt/caspida/bin/CaspidaCleanup
- Stop all services:
/opt/caspida/bin/Caspida stop-all
- Restore Postgres.
- As a root user on the Postgres node (node 2 in 20-node deployments, node 1 in all other deployments), clean any existing data. On RHEL or OEL systems, run the following command:
sudo rm -rf /var/lib/pgsql/15/data/*
On Ubuntu systems, run the following command:
sudo rm -rf /var/lib/postgresql/15/main/*
- Copy all content under
<base directory>/postgres/base
to the Postgres node. For example, if you are copying from different server, use the following command on RHEL or OEL systems:sudo scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/pgsql/15/data
On Ubuntu systems, run the following command:
sudo scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/postgresql/15/main
- As a root user, edit the
/var/lib/pgsql/15/data/postgresql.conf
(on RHEL or OEL systems) or/etc/postgresql/15/main/postgresql.conf
(on Ubuntu systems) file, and add the following property:
restore_command = ' '
- Change ownership of the backup files. On RHEL or OEL systems, run the following command:
sudo chown -R postgres:postgres /var/lib/pgsql/15/data
On Ubuntu systems, run the following command:
sudo chown -R postgres:postgres /var/lib/postgresql/15/main
- As the caspida user, restart the Postgres service by running the following command on the management node:
Monitor the Postgres logs in
/opt/caspida/bin/Caspida stop-postgres /opt/caspida/bin/Caspida start-postgres
/var/log/postgresql
, which show the recovering process. - As the caspida user, and once the recovery completes, query Postgres to see if the data is recovered. For example, run the following command from the Postgres CLI:
psql -d caspidadb -c 'SELECT * FROM dbinfo'
- As a root user on the Postgres node (node 2 in 20-node deployments, node 1 in all other deployments), clean any existing data. On RHEL or OEL systems, run the following command:
- Restore Redis. Redis backups are full backups, even for incremental Splunk UBA backups. You can restore Redis from any backup directory, such as the most recent incremental backup directory. In our example, we can backup Redis from the
0000123
incremental backup directory. The Redis backup file ends with the node number. Be sure to restore the backup file on the correct corresponding node. For example, in a 5-node cluster, the Redis file must be restored on nodes 4 and 5. Assuming the backup files are on node 1, run the following command on node 4 to restore Redis:sudo scp caspida@node1:<BACKUP_HOME>/0000123/redis/redis-server.rdb.4 /var/vcap/store/redis/redis-server.rdb
Similarly, run the following command on node 5:
View yoursudo scp caspida@node1:<BACKUP_HOME>/0000123/redis/redis-server.rdb.5 /var/vcap/store/redis/redis-server.rdb
/opt/caspida/conf/deployment/caspida-deployment.conf
file to see where Redis is running on in your deployment. - Restore InfluxDB. Similar to Redis, InfluxDB backups are full backups. You can restore InfluxDB from the most recent backup directory. In this example, InfluxDB is restored from the
0000123
full backup directory. On the management node, which hosts InfluxDB, start InfluxDB, clean it up, and restore from backup files:sudo service influxdb start influx -execute "DROP DATABASE caspida" influx -execute "DROP DATABASE ubaMonitor" influxd restore -portable <BACKUP_HOME>/0000123/influx
- Restore HDFS. To restore HDFS, you need to first restore base, and then incremental data in sequential order. In this example, you first restore from
1000123
, then0000124
,0000125
, and0000126
.- Start the necessary services. On the management node, run the following command:
/opt/caspida/bin/Caspida start-all --no-caspida
- Restore HDFS from the base backup directory:
nohup bash -c 'export BACKUPHOME=/backup; hadoop fs -copyFromLocal -f $(ls ${BACKUPHOME}/caspida/1*/hdfs/caspida -d) /user && for dir in $(ls ${BACKUPHOME}/caspida/0*/hdfs/caspida -d); do hadoop fs -copyFromLocal -f ${dir} /user || exit 1; done; echo Done' &
Restoring HDFS can take a long time. Check the process ID to see if the restore is completed. For example if the PID is 111222, check by using the following command:
ps 111222
- Change owner in HDFS:
sudo -u hdfs hdfs dfs -chown -R impala:caspida /user/caspida/analytics sudo -u hdfs hdfs dfs -chown -R mapred:hadoop /user/history sudo -u hdfs hdfs dfs -chown -R impala:impala /user/hive sudo -u hdfs hdfs dfs -chown -R yarn:yarn /user/yarn
- If you are restoring the backup to the same server, run the following commands to update the metadata based off of the full backup. The following example uses
1000123
:nohup bash -c 'export BACKUPHOME=/backup; for dir in `ls ${BACKUPHOME}/caspida/1000123/hdfs/caspida/analytics/caspida.db`; do impala-shell -d caspida -q "ALTER TABLE $dir RECOVER PARTITIONS"; done; echo Done' &
If the server you are restoring to is different from the one where the backup was taken, run the following commands to update the metadata:
Note the host is node1 in deployment file.hive --service metatool -updateLocation hdfs://<RESTORE_HOST>:8020 hdfs://<BACKUP_HOST>:8020 impala-shell -q "INVALIDATE METADATA"
- Start the necessary services. On the management node, run the following command:
- Restore your rules and customized configurations from the latest backup directory:
- Restore the configurations:
sudo cp -pr <BACKUP_HOME>/0000123/conf/* /etc/caspida/local/conf/
- Restore the rules:
sudo rm -Rf /opt/caspida/conf/rules/* sudo cp -prf <BACKUP_HOME>/0000123/rule/* /opt/caspida/conf/rules/
- Restore the configurations:
- Start the server:
Check the Splunk UBA web UI to make sure the server is operational.
/opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf /opt/caspida/bin/CaspidaCleanup container-grouping /opt/caspida/bin/Caspida start
- If the server for backup and restore are different, perform the following tasks:
- Update the data source metadata:
Replace
curl -X PUT -Ssk -v -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" https://localhost:9002/datasources/moveDS?name=<DS_NAME>
<DS_NAME>
with the data source name displayed in Splunk UBA. - Trigger a one-time sync with Splunk ES:
If your Splunk ES host did not change, run the following command:
If you are pointing to a different Splunk ES host, edit the host in Splunk UBA to automatically trigger a one-time sync.
curl -X POST 'https://localhost:9002/jobs/trigger?name=EntityScoreUpdateExecutor' -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" -H 'Content-Type: application/json' -d '{"schedule": false}' -k
- Update the data source metadata:
Backup and restore Splunk UBA using automated incremental backups | Restore Splunk UBA from incremental backups |
This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.3.0
Feedback submitted, thanks!