Migrate Splunk UBA using the backup and restore scripts
Use the backup and restore scripts located in /opt/caspida/bin/utils
to migrate your Splunk UBA deployment to the next larger size on the same operating system. For example, you can migrate from 5 nodes to 7 nodes, or 10 nodes to 20 nodes. If you want to migrate from 7 nodes to 20 nodes, migrate from 7 nodes to 10 nodes first, then from 10 nodes to 20 nodes.
Below is a summary of the migration process using the backup and restore scripts. For example, to migrate from a 3 node cluster to a 5 node cluster:
- Run the
uba-backup.sh
script on the 3 node cluster. The script stops Splunk UBA, perform the backup, then restarts Splunk UBA on the 3 node cluster. - Set up the 5 node cluster so that all nodes meet the system requirements, and install Splunk UBA. The version number of the Splunk UBA software must match the version number of the backup. See the Splunk UBA installation checklist in Install and Upgade Splunk User Behavior Analytics to begin a Splunk UBA installation.
- Verify that Splunk UBA is up and running in the 5 node cluster. See Verify successful installation in Install and Upgrade Splunk User Behavior Analytics.
- Run the
uba-restore.sh
script on the 5 node cluster. The sript stops Splunk UBA, restores the system from the earlier backup, then starts Splunk UBA.
In addition to migration, you can use the backup and restore scripts as an alternative way of capturing backups of your Splunk UBA system, in addition to or in place of the automated incremental backups. See Backup and restore Splunk UBA using automated incremental backups.
Requirements for using the backup and restore scripts
Make sure the following requirements are met before using the backup and restore scripts:
- The target system you are migrating to must be set up with Splunk UBA already up and running.
- The backup system and the target system you are migrating to must have the same version of Splunk UBA running on the same operating system.
- The target system you are migrating to must be the same size or one deployment size larger than the backup system. See Plan and scale your Splunk UBA deployment for information about the supported Splunk UBA deployment sizes.
Back up Splunk UBA using the backup script
Perform a full backup of Splunk UBA using the /opt/caspida/bin/utils/uba-backup.sh
script. View the command line options by using the --help
option. The table lists and describes the various options that can be used in the script.
Option | Description |
---|---|
--archive | Create a single archive containing all of the backup data. The archive is created after the backup is completed and Splunk UBA is restarted. |
--archive-type %FORMAT% | Specity the type of archive you want to create.
Install a package called pigz on the master node to use multi-threaded compression when creating yum -y install pigz |
--dateformat %FORMAT% | Override the default date/time format for the backup folder name. If this option is not used, the folder name is based on ISO 8601 format YYYY-MM-DD . To specify a backup folder name in the typical format used in the United States, specify MM-DD-YYYY . Using this option also overrides the date/time format of the logging messages.
|
--folder %FOLDER% | Override the target folder location where the backup is stored. Use this option if you configured a secondary volume for storing backups, such as another 1TB disk on the management node. Don't use NFS for performance ramifications. |
--log-time | Add additional logging for how long each section takes, including all function calls and tasks. Use this option to help troubleshoot issues if your backup is taking more than two hours. |
--no-data | Don't back up any data, only the Splunk UBA configuration. |
--no-prestart | Don't start Splunk UBA before the backup begins, because Splunk UBA is already running. Make sure Splunk UBA is up and running before using this option. |
--no-start | Don't start Splunk UBA after the backup is completed. Use this option to perform additional post-backup actions that required Splunk UBA to be offline. |
--restart-on-fail | Restart Splunk UBA if the backup fails. If Splunk UBA encounters an error during the backup, the script attempts to restart Splunk UBA so the system does not remain offline. |
--script %FILENAME% | Run the specified script after the backup is completed. Use this with the --no-start option if your script requires Splunk UBA to be offline.
|
--skip-hdfs-fsck | Skip the HDFS file system consistency check. This is useful in large environments if you want to skip this check due to time constraints. |
--use-distcp | Perform a parallel backup of Hadoop. If the HDFS export is taking several hours, use this option to perform a parallel backup which may be faster. Use the --log-time option to examine how long the HDFS export is taking.
|
Below is an example backup.
- Login to the master node of your Splunk UBA deployment as
caspida
using SSH. - Navigate to the
/opt/caspida/bin/utils
folder:cd /opt/caspida/bin/utils
- Run the backup script. Below is the command and its output:
[caspida@ubanode1]$ /opt/caspida/bin/utils/uba-backup.sh --no-prestart UBA Backup Script - Version 1.9.2 Backup started at: Wed Jan 8 12:11:10 PST 2020 Backup running on: ubanode1.example.domain Logfile: /var/log/caspida/uba-backup-2020-01-08_12-11-10.log Script Name: uba-backup.sh Script SHA: 06170431f2791e579bcba055df79d472d9c68614cf6c4c2497eb62ed48422e6a Parsing any CLI args - Disabling UBA pre-start before backup Node Count: 1 Testing SSH connectivity to UBA node 1 (ubanode1) Attempting to resolve the IP of UBA node ubanode1 UBA node ubenode1 resolves to 192.168.19.88 Not starting UBA (pre-backup), disabled via CLI Backup folder: /var/vcap/ubabackup/2020-01-08_12-11-11 Creating backup folder Changing ownership of the backup folder WARNING: No datasources were found as active in UBA Determining current counts/stats from PostgreSQL Stopping UBA (full) Starting UBA (partial) Checking that HDFS isnt in safe-mode - Safe mode is disabled Performing fsck of HDFS (this may take a while) Creating backup of deployment configuration Creating backup of local configurations Creating backup of UBA rules Creating backup of version information Creating backup of PostgreSQL caspidadb database on UBA node 1 (spuba50) Creating backup of PostgreSQL metastore database on UBA node 1 (spuba50) Logging PostgreSQL sparkserverregistry table Creating backup of Hadoop HDFS (this may take a while) - Checking status of PID 30850 (2020-01-08_12-16-22) - Backup job has finished (total size: 683M) Logging Redis information Stopping UBA (full) Creating backup of timeseries data Creating backup of Redis database (parallel mode) - Performing backup of UBA node 1 (spuba50) - Waiting for pid 12772 to finish - Process finished successfully Creating summary of backup Starting UBA (full) Backup completed successfully Time taken: 0 hour(s), 10 minute(s), 58 second(s)
You can review the log file in /var/log/caspida/uba-backup-<timestamp>.log
.
Restore Splunk UBA using the restore script
After you have created a backup, restore Splunk UBA using the /opt/caspida/bin/utils/uba-restore.sh
script. View the command line options by using the --help
option. The table lists and describes the various options that can be used in the script.
Option | Description |
---|---|
--dateformat %FORMAT% | Override the default date/time format for the backup folder name. If this option is not used, the folder name is based on ISO 8601 format YYYY-MM-DD. To specify a backup folder name in the typical format used in the United States, specify MM-DD-YYYY. Using this option also overrides the date/time format of the logging messages. |
--folder %FOLDER% | Override the source folder to perform the restore. By default the script looks for the restore archive in the default /var/vcap/ubabackup directory. If you used the --folder option when creating the backup to store the backup in a different directory, specify that same directory using the --folder option when restoring Splunk UBA.
|
--log-time | Add additional logging for how long each section takes, including all function calls and tasks. Use this option to help troubleshoot issues if your restore is taking a long time. |
Below is an example restore:
- Login to the master node of your Splunk UBA deployment as
caspida
using SSH. - Navigate to the
/opt/caspida/bin/utils
folder:cd /opt/caspida/bin/utils
- Run the restore script. In this example, we are restoring from a backup file in the
/home/caspida
directory from a single node system to a 3-node deployment. Below is the command and its output:[caspida@ubanode1]$ /opt/caspida/bin/utils/uba-restore.sh --folder /home/caspida/2020-01-08_12-11-11/ UBA Restore Script - Version 1.9.2 Backup started at: Wed Jan 8 12:26:57 PST 2020 Backup running on: ubanode1.example.domain Logfile: /var/log/caspida/uba-restore-2020-01-08_12-26-57.log Script Name: uba-restore.sh Script SHA: 4819a5d2ed713a5a040dfeb4dd30fed0a42406f2238d002ade7d293c3460285f Parsing any CLI args - Set source folder to /home/caspida/2020-01-08_12-11-11 Node Count: 3 Backup Node Count: 1 Detected migration from 1-node to 3-node WARNING: The hostnames from the backup/restore hosts differ, this will be a migration Execution Mode: Migration Testing SSH connectivity to UBA node 1 (ubanode1) Testing SSH connectivity to UBA node 2 (ubanode2) Testing SSH connectivity to UBA node 3 (ubanode3) Attempting to resolve the IP of UBA node ubanode1 UBA node ubanode1 resolves to 192.168.19.88 Attempting to resolve the IP of UBA node ubanode2 UBA node ubanode2 resolves to 192.168.19.89 Attempting to resolve the IP of UBA node ubanode3 UBA node ubanode3 resolves to 192.168.19.90 Attempting to retrieve the IP of each node (old) Stopping UBA (full) Starting PostgreSQL Logging PostgreSQL sparkserverregistry table (pre-restore) Restoring PostgreSQL caspidadb database on UBA node 1 (ubanode1) Restoring PostgreSQL metastore database on UBA node 1 (ubanode1) Performing reset of connector stats/JMS schemas Stopping PostgreSQL Restoring timeseries data Backing up existing uba-system-env.sh/uba-tuning.properties Restoring local configurations Restoring UBA rules Restoring uba-system-env.sh/uba-tuning.properties Syncing UBA cluster Starting UBA (partial) Checking that HDFS isnt in safe-mode - Safe mode is disabled Checking if the /user folder exists in HDFS - Folder exists, will attempt to remove Removing existing Hadoop HDFS content (attempt 1) Waiting for 10 seconds before proceeding Checking if the /user folder exists in HDFS - Folder has been removed, will continue Restoring Hadoop HDFS (this may take a while) - Checking status of PID 5754 (2020-01-08_12-32-47) - Restore is still running, please wait - Folder size: 242.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-33-09) - Restore is still running, please wait - Folder size: 254.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-33-31) - Restore is still running, please wait - Folder size: 543.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-33-53) - Restore is still running, please wait - Folder size: 543.6 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-34-15) - Restore is still running, please wait - Folder size: 546.4 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-34-37) - Restore is still running, please wait - Folder size: 547.1 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-34-59) - Restore is still running, please wait - Folder size: 548.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-35-21) - Restore is still running, please wait - Folder size: 549.6 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-35-43) - Restore is still running, please wait - Folder size: 550.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-36-06) - Restore is still running, please wait - Folder size: 550.8 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-36-28) - Restore is still running, please wait - Folder size: 551.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-36-50) - Restore is still running, please wait - Folder size: 553.0 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-37-12) - Restore is still running, please wait - Folder size: 554.1 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-37-34) - Restore is still running, please wait - Folder size: 554.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-37-56) - Restore is still running, please wait - Folder size: 555.9 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-38-18) - Restore is still running, please wait - Folder size: 556.6 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-38-40) - Restore is still running, please wait - Folder size: 557.6 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-39-02) - Restore is still running, please wait - Folder size: 558.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-39-24) - Restore is still running, please wait - Folder size: 561.9 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-39-46) - Restore is still running, please wait - Folder size: 562.8 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-40-08) - Restore is still running, please wait - Folder size: 563.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-40-30) - Restore is still running, please wait - Folder size: 564.3 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-40-52) - Restore is still running, please wait - Folder size: 564.9 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-41-14) - Restore is still running, please wait - Folder size: 566.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-41-36) - Restore is still running, please wait - Folder size: 566.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-41-58) - Restore is still running, please wait - Folder size: 567.3 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-42-20) - Restore is still running, please wait - Folder size: 568.1 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-42-42) - Restore is still running, please wait - Folder size: 568.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-43-04) - Restore is still running, please wait - Folder size: 570.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-43-26) - Restore is still running, please wait - Folder size: 570.7 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-43-48) - Restore is still running, please wait - Folder size: 571.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-44-10) - Restore is still running, please wait - Folder size: 571.9 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-44-32) - Restore is still running, please wait - Folder size: 572.6 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-44-54) - Restore is still running, please wait - Folder size: 573.2 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-45-16) - Restore is still running, please wait - Folder size: 574.1 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-45-38) - Restore is still running, please wait - Folder size: 589.0 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-46-00) - Restore is still running, please wait - Folder size: 601.5 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-46-23) - Restore is still running, please wait - Folder size: 610.3 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-46-45) - Restore is still running, please wait - Folder size: 646.0 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-47-07) - Restore is still running, please wait - Folder size: 647.0 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-47-29) - Restore is still running, please wait - Folder size: 647.5 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-47-51) - Restore is still running, please wait - Folder size: 665.0 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-48-13) - Restore is still running, please wait - Folder size: 669.1 M (target: 669.3 M) - Checking status of PID 5754 (2020-01-08_12-48-35) - Backup job has finished Changing ownership of Hadoop HDFS files Updating location of hive warehouse Updating location of caspida analytics database Flushing existing Redis data Stopping Redis for Redis restore Restoring Redis database (parallel mode) Triggering Redis restore of node 1 to 3 - Performing restore of data from UBA node 1 to UBA node 3 (ubanode3) - Performing rsync of database from UBA node 1 to UBA node 3 - Waiting for pid 12992 to finish - Process finished successfully Starting Redis after Redis restore Retrieving Redis information (pre-fixup) Performing Redis restore fixup (attempt 1) Performing Redis restore rebalance (attempt 1) Successfully finished Redis fixup/rebalance Retrieving Redis information (post-fixup) Determining number of Redis keys (post-restore) - Retrieving keys from ubanode3 (ubanode3) Redis key counts match (3207 vs 3207) Comparing Redis keys All Redis keys were found Tuning configuration Stopping UBA (partial) Syncing UBA cluster Copying /opt/caspida/conf to hdfs /user/caspida/config/etc/caspida/conf Configuring containerization Starting UBA (full) Testing Impala Logging PostgreSQL sparkserverregistry table (post-restore) Comparing PostgreSQL backup/restore counts/stats Migrating datasources - Deleting file-based datasource: 0_resolution-rainbow.infoblox - Deleting file-based datasource: HR-rainbow.csv - Migrating datasource: fileaccess - Deleting file-based datasource: rainbow.ad_multiline - Deleting file-based datasource: rainbow.ad_snare_flat - Deleting file-based datasource: rainbow.box - Deleting file-based datasource: rainbow.box_events - Deleting file-based datasource: rainbow.brivo - Deleting file-based datasource: rainbow.cef - Deleting file-based datasource: rainbow.ciscosa - Deleting file-based datasource: rainbow.o365_msg_trace - Deleting file-based datasource: rainbow.pan - Deleting file-based datasource: rainbow.splunk_cs - Deleting file-based datasource: rainbow.symantecdlp_dmp - Deleting file-based datasource: rainbow.symantecdlp_endpoint - Deleting file-based datasource: rainbow.webgateway - Deleting file-based datasource: rainbow.weblog Restore completed successfully Time taken: 0 hour(s), 29 minute(s), 19 second(s)
You can review the log file in/var/log/caspida/uba-restore-<timestamp>.log
. - If you are integrating Splunk UBA with Splunk Enterprise Security (ES), install the Splunk ES SSL certificates in the restored deployment. See Configure the Splunk platform to receive data from the Splunk UBA output connector in the Send and Receive Data from the Splunk Platform manual.
Verify Splunk UBA is up and running
See Verify successfull installation in Install and Upgrade Splunk User Behavior Analytics for information about how to verify that Splunk UBA is up and running properly.
You can also run the uba_pre_check.sh
script as part of this verification. See Check system status before and after installation in Install and Upgrade Splunk User Behavior Analytics.
Backup and restore Splunk UBA using automated incremental backups | Configure warm standby in Splunk UBA |
This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.0.1, 5.0.2, 5.0.3
Feedback submitted, thanks!