Splunk® User Behavior Analytics

Administer Splunk User Behavior Analytics

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Backup and restore Splunk UBA using automated incremental backups

Attach an additional disk to the Splunk UBA management node in your deployment and configure automated incremental backups.

  • Periodic incremental backups are performed without stopping Splunk UBA. You can configure the frequency of these backups by configuring the cron job in /opt/caspida/conf/jobconf/caspida-jobs.json.
  • A weekly full backup is performed without stopping Splunk UBA. You can configure the frequency of these backups using the backup.filesystem.full.interval property.

You can use incremental backup and restore as an HA/DR solution that is less resource-intensive than the warm standby solution described in Configure warm standby in Splunk UBA. You can use the backups to restore Splunk UBA on the existing server, or to a new and separate server.

Configure incremental backups in Splunk UBA

Perform the following steps to configure incremental backups of your Splunk UBA deployment:

  1. On the Splunk UBA master node, attach an additional disk dedicated for filesystem backup. For example, mount a device on a local directory on the Splunk UBA management node.
    In 20-node clusters, Postgres services run on node 2 instead of node 1. You will need two additional disks - one on node 1 and a second on node 2 for the Postgres services, or you may have a shared storage device that can be accessed by both nodes. Make sure that the backup folder on both nodes is the same. By default, the backups are written to the /backup folder.
  2. Stop Splunk UBA.
    /opt/caspida/bin/Caspida stop
  3. Create a dedicated directory on the management node and change directory permissions so that backup files can be written into the directory. If warm standby is also configured, perform these tasks on the management node in the primary cluster.
    sudo mkdir /backup
    sudo chmod 777 /backup
    
  4. Mount the dedicated device on the backup directory. For example, a new 5TB hard drive mounted on the backup directory:
    caspida@node1:~$ df -h /dev/sdc
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdc        5.0T  1G    4.9T   1% /backup
    
    If the backup device is on the local disk, mount the disk using its UUID, which can be found in /etc/fstab. See Prepare the server for installation in Install and Upgrade Splunk User Behavior Analytics.
  5. Add the following properties into /etc/caspida/local/conf/uba-site.properties:
    backup.filesystem.enabled=true
    backup.filesystem.directory.restore=/backup
    
  6. Synchronize the configuration across the cluster:
    /opt/caspida/bin/Caspida sync-cluster
  7. Register filesystem backup:
    /opt/caspida/bin/replication/setup filesystem

    If the same host has been registered before, run the command again with the reset flag:

    /opt/caspida/bin/replication/setup filesystem -r
  8. Enable Postgres archiving.
    1. Create the directory where archives will be stored. For example, /backup/wal_archive:
      sudo mkdir /backup/wal_archive
      sudo chown postgres:postgres /backup/wal_archive
      
    2. Create a file called archiving.conf on the PostgreSQL node (node 2 for 20-node deployments, node 1 for all other deployments). On RHEL, Oracle Linux, and CentOS systems:
      cd /var/vcap/store/pgsql/10/data/conf.d/
      sudo mv archiving.conf.sample archiving.conf
      

      On Ubuntu systems:

      cd /etc/postgresql/10/main/conf.d/
      sudo mv archiving.conf.sample archiving.conf
      
      If your archive directory is not /backup/wal_archive, edit archiving.conf to change the archive directory.
    3. Restart PostgreSQL services on the master node:
      /opt/caspida/bin/Caspida stop-postgres
      /opt/caspida/bin/Caspida start-postgres
      
  9. On the master node:
    1. In the primary cluster, enable the replication system job by adding the ReplicationCoordinator property into the /etc/caspida/local/conf/caspida-jobs.json file. Below is a sample of the file before adding the property:
      /**
       * Copyright 2014 - Splunk Inc., All rights reserved.
       * This is Caspida proprietary and confidential material and its use
       * is subject to license terms.
       */
      {
        "systemJobs": [
          {
            // "name" : "ThreatComputation",
            // "cronExpr"   : "0 0 0/1 * * ?",
            // "jobArguments" : { "env:CASPIDA_JVM_OPTS" :  "-Xmx4096M" }
          }
        ]
      } 
      

      After adding the property, the file should look like this:

      /**
       * Copyright 2014 - Splunk Inc., All rights reserved.
       * This is Caspida proprietary and confidential material and its use
       * is subject to license terms.
       */
      {
        "systemJobs": [
          {
            // "name" : "ThreatComputation",
            // "cronExpr"   : "0 0 0/1 * * ?",
            // "jobArguments" : { "env:CASPIDA_JVM_OPTS" :  "-Xmx4096M" }
          },
          {
            "name"         : "ReplicationCoordinator",
            "enabled"      : true
          }
        ]
      } 
      
    2. Run the following command to synchronize the cluster:
      /opt/caspida/bin/Caspida sync-cluster
  10. Start Splunk UBA:
    /opt/caspida/bin/Caspida start

How Splunk UBA generates and stores the automated full and incremental backup files

An initial full backup is triggered automatically when the next scheduled job starts, as defined by the ReplicationCoordinator property in the /opt/caspida/conf/jobconf/caspida-jobs.json file. After the initial full backup, a series of incremental backups is performed until the next scheduled full backup. By default, Splunk UBA performs a full backup every 7 days. To change this interval, perform the following tasks:

  1. Log in to the Splunk UBA master node as the caspida user.
  2. Edit the backup.filesystem.full.interval property in /etc/caspida/local/conf/uba-site.properties.
  3. Synchronize the cluster.
    /opt/caspida/bin/Caspida sync-cluster  /etc/caspida/local/conf

You can identify the base directories containing the full backups and the incremental backup directories by the first digit of the directory name.

  • A base directory has a sequence number starting with 1.
  • An incremental directory has a sequence number starting with 0.

In the following example, the base directory 1000123 contains a full backup taking up 35GB of space, while the incremental directories 0000124, 0000125 and 0000126 have backup files around 1.5GB for each.

caspida@node1:~$ du -sh /backup/caspida/*
1.5G    /backup/caspida/0000124
1.5G    /backup/caspida/0000125
1.4G    /backup/caspida/0000126
35G /backup/caspida/1000123

The following restore scenarios are supported, using this example:

Generate a full backup on-demand without waiting for the next scheduled job

Perform the following tasks to generate a full backup without waiting for the next scheduled job to do it for you.

  1. Make sure you have set up your Splunk UBA deployment for automated incremental backups.
  2. On the master node, edit the /etc/caspida/local/conf/uba-site.properties file and set the backup.filesystem.full.interval property to 0 days. For example:
    backup.filesystem.full.interval = 0d
  3. Synchronize the configuration change across the cluster:
    /opt/caspida/bin/Caspida sync-cluster  /etc/caspida/local/conf
  4. Use the following curl command to trigger a new cycle:
    curl -X POST -k -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)"  https://localhost:9002/jobs/trigger?name=ReplicationCoordinator
  5. Check the /var/log/caspida/replication/replication.log file to make sure the full backup is starting:
    2020-06-15 14:01:56,120 INFO MainProcess.MainThread coordinator.prepCycle.209: Target cycle is: 0000154
    2020-06-15 14:02:03,422 INFO MainProcess.MainThread coordinator.isFullBackup.308: Need to perform full backup. 
    Last cycle: 2020-06-11 16:20:10; Interval: 0:00:00
    
  6. (Recommended) Restore the backup.filesystem.full.interval property back to its default value of 7 days. You can set the property as follows and synchronize the cluster, or delete the the property altogether from the /etc/caspida/local/conf/uba-site.properties file and synchronize the cluster:
    backup.filesystem.full.interval = 7d

Restore Splunk UBA from a full backup

This example shows how to restore from a full backup, using the base directory 1000123 without any accompanying incremental directories.

  1. Prepare the server for the restore operation. If there is any existing data, run:
    /opt/caspida/bin/CaspidaCleanup
  2. Stop all services:
    /opt/caspida/bin/Caspida stop-all
  3. Restore Postgres.
    1. On the Postgres node (node 2 in 20-node deployments, node 1 in all other deployments), clean any existing data. On RHEL, OEL, or CentOS systems, run the following command:
      sudo rm -rf /var/lib/pgsql/10/main/*

      On Ubuntu systems, run the following command:

      sudo rm -rf /var/lib/postgresql/10/main/*
    2. Copy all content under <base directory>/postgres/base to the Postgres node. For example, if you are copying from different server, use the following command on RHEL, OEL, or CentOS systems:
      sudo scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/pgsql/10/main

      On Ubuntu systems, run the following command:

      sudo scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/postgresql/10/main
    3. Edit the /var/lib/pgsql/10/main/recovery.conf (on RHEL, OEL, or CentOS systems) or /var/lib/postgresql/10/main/recovery.conf (on Ubuntu systems) file, clear all content, and add the following property:
      restore_command = ''
    4. Change ownership of the backup files. On RHEL, OEL, or CentOS systems, run the following command:
      sudo chown -R postgres:postgres /var/lib/pgsql/10/main

      On Ubuntu systems, run the following command:

      sudo chown -R postgres:postgres /var/lib/postgresql/10/main
    5. Start the Postgres service by running the following command on the master node:
      /opt/caspida/bin/Caspida start-postgres
      Monitor the Postgres logs in /var/log/postgresql, which show the recovering process.
    6. Verify that Postgres is restored. Check in the /var/lib/pgsql/10/main (on RHEL, OEL, or CentOS systems) or /var/lib/postgresql/10/main (on Ubuntu systems) directory and verify that the recovery.conf file is renamed to recovery.done.
    7. Once the recovery completes, query Postgres to see if the data is recovered. For example, run the following command from the Postgres CLI:
      psql -d caspidadb -c 'SELECT * FROM dbinfo'
  4. Restore Redis. Redis backups are full backups, even for incremental Splunk UBA backups. You can restore Redis from any backup directory, such as the most recent incremental backup directory. In our example, we can backup Redis from the 0000126 incremental backup directory. The Redis backup file ends with the node number. Be sure to restore the backup file on the correct corresponding node. For example, in a 5-node cluster, the Redis file must be restored on nodes 4 and 5. Assuming the backup files are on node 1, run the following command on node 4 to restore Redis:
    sudo scp caspida@node1:<BACKUP_HOME>/0000126/redis/redis-server.rdb.4 /var/vcap/store/redis/redis-server.rdb
    

    Similarly, run the following command on node 5:

    sudo scp caspida@node1:<BACKUP_HOME>/0000126/redis/redis-server.rdb.5 /var/vcap/store/redis/redis-server.rdb
    
    View your /etc/caspida/local/conf/caspida-deployment.conf file to see where Redis is running on in your deployment.
  5. Restore InfluxDB. Similar to Redis, InfluxDB backups are full backups. You can restore InfluxDB from the most recent backup directory. In this example, InfluxDB is restored from the 0000126 incremental backup directory. On the management node, which hosts InfluxDB, start InfluxDB, clean it up, and restore from backup files:
    sudo service influxdb start
    influx -execute "DROP DATABASE caspida"
    influx -execute "DROP DATABASE ubaMonitor"
    influxd restore -portable <BACKUP_HOME>/0000126/influx
    
  6. Restore HDFS. To restore HDFS, we need to first restore base, and incremental data in continues sequence. In our example, we first restore from 1000123, then 0000124, 0000125 and 0000126.
    1. Start the necessary services. On the management node, run the following command:
      /opt/caspida/bin/Caspida start-all --no-caspida
    2. Restore HDFS from the base backup directory:
      nohup hadoop fs -copyFromLocal <BACKUP_HOME>/1000123/hdfs/caspida /user &

      Restoring HDFS can take a long time. Check the process ID to see if the restore is completed. For example if the PID is 111222, check by using the following command:

      ps 111222
    3. Change owner in HDFS:
      sudo -u hdfs hdfs dfs -chown -R impala:caspida /user/caspida/analytics
      sudo -u hdfs hdfs dfs -chown -R mapred:hadoop /user/history
      sudo -u hdfs hdfs dfs -chown -R impala:impala /user/hive
      sudo -u hdfs hdfs dfs -chown -R yarn:yarn /user/yarn
      
    4. If the server you are restoring to is different from the one where the backup was taken, run the following commands to update the metadata:
      sudo hive --service metatool -updateLocation hdfs://<RESTORE_HOST>:8020 hdfs://<BACKUP_HOST>:8020
      impala-shell -q "INVALIDATE METADATA"
      
      Note the host is node1 in deployment file.
  7. Restore your rules and customized configurations from the latest backup directory:
    1. Restore the configurations:
      sudo cp -pr <BACKUP_HOME>/0000126/conf/* /etc/caspida/local/conf/
    2. Restore the rules:
      sudo rm -Rf /opt/caspida/conf/rules/*
      sudo cp -prf <BACKUP_HOME>/0000126/rule/* /opt/caspida/conf/rules/
      
  8. Start the server:
    /opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf
    /opt/caspida/bin/CaspidaCleanup container-grouping
    /opt/caspida/bin/Caspida start
    
    Check the Splunk UBA web UI to make sure the server is operational.
  9. If the server for backup and restore are different, perform the following tasks:
    1. Update the data source metadata:
      curl -X PUT -Ssk -v -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" https://localhost:9002/datasources/moveDS?name=<DS_NAME>
      
      Replace <DS_NAME> with the data source name displayed in Splunk UBA.
    2. Trigger a one-time sync with Splunk ES: If your Splunk ES host did not change, run the following command:
      curl -X POST 'https://localhost:9002/jobs/trigger?name=EntityScoreUpdateExecutor' -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" -H 'Content-Type: application/json' -d '{"schedule": false}' -k
      
      If you are pointing to a different Splunk ES host, edit the host in Splunk UBA to automatically trigger a one-time sync.

Restore Splunk UBA from incremental backups

To restore Splunk UBA from online incremental backup files, at least one base backup directory containing a full backup must exist.

This example shows how to restore from a base directory 1000123 with all of the incremental directories 0000124, 0000125, and 0000126.

  1. Prepare the server for the restore operation. If there is any existing data, run:
    /opt/caspida/bin/CaspidaCleanup
  2. Stop all services:
    /opt/caspida/bin/Caspida stop-all
  3. Restore Postgres.
    1. On the Postgres node (node 2 in 20-node deployments, node 1 in all other deployments), clean any existing data. On RHEL, OEL, or CentOS systems, use the following command:
      sudo rm -rf /var/lib/pgsql/10/main/*

      On Ubuntu systems, use the following command:

      sudo rm -rf /var/lib/postgresql/10/main/*
    2. Copy all content under <base directory>/postgres/base to the Postgres node. For example, if you are copying from different server on RHEL, OEL, or CentOS systems, use the following command:
      sudo scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/pgsql/10/main

      On Ubuntu systems, use the following command:

      sudo scp -r caspida@ubap1:<BACKUP_HOME>/1000123/postgres/base/* /var/lib/postgresql/10/main
    3. Remove unnecessary WAL files. On RHEL, OEL, or CentOS systems, use the following command:
      sudo rm -rf /var/lib/pgsql/10/main/pg_wal/*

      On Ubuntu systems, use the following command:

      sudo rm -rf /var/lib/postgresql/10/main/pg_wal/*

      Make sure the system has access to Postgres WAL archive directory. Modify the /var/lib/pgsql/10/main/recovery.conf (on RHEL, OEL, or CentOS systems) or /var/lib/postgresql/10/main/recovery.conf (on Ubuntu systems) file. Remove all contents in the file, and add the following properties:

      restore_command = 'cp <WAL directory>/%f "%p"'
      recovery_target_time = '<recovery timestamp>'
      recovery_target_action = 'promote'
      

      Where <WAL directory> is the directory with all Postgres WAL files, and <recovery timestamp> is the timestamp in backup file <BACKUP_HOME>/0000126/postgres/recovery_target_time.
      For example, the recovery.conf file looks like this:

      restore_command = 'cp /backup/wal_archive/%f "%p"'
      recovery_target_time = '2019-09-16 12:36:03'
      recovery_target_action = 'promote'
      
    4. Change ownership of the backup files. On RHEL, OEL, or CentOS systems, use the following command:
      sudo chown -R postgres:postgres /var/lib/pgsql/10/main

      On Ubuntu systems, use the following command:

      sudo chown -R postgres:postgres /var/lib/postgresql/10/main
    5. Start Postgres services. Run the following command on master node:
      /opt/caspida/bin/Caspida start-postgres
      Monitor Postgres logs under /var/log/postgresql, which show the recovering process.
    6. Verify that Postgres is restored. Check in the /var/lib/pgsql/10/main (on RHEL, OEL, CentOS systems) or /var/lib/postgresql/10/main (on Ubuntu systems) directory and verify that the recovery.conf file is renamed to recovery.done.
    7. Once the recovery completes, query Postgres to see if data is recovered. For example, run the following command from the Postgres CLI:
      psql -d caspidadb -c 'SELECT * FROM dbinfo'
  4. Restore Redis. Redis backups are full backups, even for incremental Splunk UBA backups. You can restore Redis from any backup directory, such as the most recent incremental backup directory. In our example, we can backup Redis from the 0000126 incremental backup directory. The Redis backup file ends with the node number. Be sure to restore the backup file on the correct corresponding node. For example, in a 5-node cluster, the Redis file must be restored on nodes 4 and 5. Assuming the backup files are on node 1, run the following command on node 4 to restore Redis:
    sudo scp caspida@node1:<BACKUP_HOME>/0000126/redis/redis-server.rdb.4 /var/vcap/store/redis/redis-server.rdb
    

    Similarly, run the following command on node 5:

    sudo scp caspida@node1:<BACKUP_HOME>/0000126/redis/redis-server.rdb.5 /var/vcap/store/redis/redis-server.rdb
    
    View your /etc/caspida/local/conf/caspida-deployment.conf file to see where Redis is running on in your deployment.
  5. Restore InfluxDB. Similar to Redis, InfluxDB backups are full backups. You can restore InfluxDB from the most recent backup directory. In this example, InfluxDB is restored from the 0000126 incremental backup directory. On the management node, which hosts InfluxDB, start InfluxDB, clean it up, and restore from backup files:
    sudo service influxdb start
    influx -execute "DROP DATABASE caspida"
    influx -execute "DROP DATABASE ubaMonitor"
    influxd restore -portable <BACKUP_HOME>/0000126/influx
    
  6. Restore HDFS. To restore HDFS, we need to first restore base, and incremental data in continues sequence. In our example, we first restore from 1000123, then 0000124, 0000125 and 0000126.
    1. Start the necessary services. On the management node, run the following command:
      /opt/caspida/bin/Caspida start-all --no-caspida
    2. Restore HDFS from the base backup directory and also restore the incremental backup directories:
      nohup bash -c 'export BACKUPHOME=/backup; hadoop fs -copyFromLocal `ls ${BACKUPHOME}/caspida/1*/hdfs/caspida -d` && for dir in `ls ${BACKUPHOME}/caspida/0*/hdfs/caspida -d`; do hadoop fs -copyFromLocal -f $dir || exit 1;  done; echo Done' & 
      

      Replace /backup as the value of BACKUP_HOME as needed, if you configured a different directory for your backups. Restoring HDFS can take a long time. Check the process ID to see if the restore is completed. For example if the PID is 111222, check by using the following command:

      ps 111222
      You can also check the nohup.out file and look for "Done" at the end of the file.
    3. Change owner in HDFS:
      sudo -u hdfs hdfs dfs -chown -R impala:caspida /user/caspida/analytics
      sudo -u hdfs hdfs dfs -chown -R mapred:hadoop /user/history
      sudo -u hdfs hdfs dfs -chown -R impala:impala /user/hive
      sudo -u hdfs hdfs dfs -chown -R yarn:yarn /user/yarn
      
    4. If the server you are restoring to is different from the one where the backup was taken, run the following commands to update the metadata:
      hive --service metatool -updateLocation hdfs://<RESTORE_HOST>:8020 hdfs://<BACKUP_HOST>:8020
      impala-shell -q "INVALIDATE METADATA"
      
      Note the host is node1 in deployment file.
  7. Restore your rules and customized configurations from the latest backup directory:
    1. Restore the configurations:
      cp -pr <BACKUP_HOME>/0000126/conf/* /etc/caspida/local/conf/
    2. Restore the rules:
      rm -Rf /opt/caspida/conf/rules/*
      cp -prf <BACKUP_HOME>/0000126/rule/* /opt/caspida/conf/rules/
      
  8. Start the server:
    /opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf
    /opt/caspida/bin/CaspidaCleanup container-grouping
    /opt/caspida/bin/Caspida start
    
    Check the Splunk UBA web UI to make sure the server is operational.
  9. If the server for backup and restore are different, perform the following tasks:
    1. Update the data source metadata:
      curl -X PUT -Ssk -v -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" https://localhost:9002/datasources/moveDS?name=<DS_NAME>
      
      Replace <DS_NAME> with the data source name displayed in Splunk UBA.
    2. Trigger a one-time sync with Splunk ES: If your Splunk ES host did not change, run the following command:
      curl -X POST 'https://localhost:9002/jobs/trigger?name=EntityScoreUpdateExecutor' -H "Authorization: Bearer $(grep '^\s*jobmanager.restServer.auth.user.token=' /opt/caspida/conf/uba-default.properties | cut -d'=' -f2)" -H 'Content-Type: application/json' -d '{"schedule": false}' -k
      
      If you are pointing to a different Splunk ES host, edit the host in Splunk UBA to automatically trigger a one-time sync.
Last modified on 08 January, 2022
PREVIOUS
Prepare automated incremental backups in Splunk UBA
  NEXT
Restore Splunk UBA from a full backup

This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.0.5, 5.0.5.1


Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters