Back up indexed data
To decide how to back up indexed data, it helps to understand first how Splunk stores data and how the data ages once it's in Splunk. Then you can decide on a backup strategy.
Before you read this topic, you really should look at "How Splunk stores indexes" to get familiar with the structure of indexes and the options for configuring them. But if you want to jump right in, the next section here attempts to summarize the key points from that topic.
How data ages
Indexed data resides in database directories consisting of subdirectories called buckets. Each index has its own set of databases.
As data ages, it moves through several types of buckets. You determine how the data ages by by configuring attributes in indexes.conf. Read "How Splunk stores indexes" for a detailed description of buckets and of the settings in
indexes.conf that control how and when data ages.
Briefly, here's how a somewhat simplified version of how data ages in a Splunk index:
1. When Splunk first indexes data, it goes into a "hot" bucket. Depending on your configuration, there can be several hot buckets open at one time. Hot buckets cannot be backed up because Splunk is actively writing to them, but you can take a snapshot of them.
2. The data remains in the hot bucket until the policy conditions are met for it to be reclassified as "warm" data. This is called "rolling" the data into the warm bucket. This happens when a hot bucket reaches a specified size or age, or whenever
splunkd gets restarted. When a hot bucket is rolled, its directory is renamed, and it becomes a warm bucket. (You can also manually roll a bucket from hot to warm, as described as described below.) It is safe to back up the warm buckets.
3. When the index reaches one of several possible configurable limits, usually a specified number of warm buckets, the oldest bucket becomes a "cold" bucket. Splunk moves the bucket to the
colddb directory. The default number of warm buckets is 300.
4. Finally, at a time based on your defined policy requirements, the bucket rolls from cold to "frozen". Splunk deletes frozen buckets. However, if you need to preserve the data, Splunk can archive it before deleting the bucket. See "Archive indexed data" for more information.
You can set retirement and archiving policy by controlling several different parameters, such as the size of indexes or buckets or the age of the data.
- hot buckets - Currently being written to; do not back these up.
- warm buckets - Rolled from hot; can be safely backed up.
- cold buckets - Rolled from warm; buckets are moved to another location.
- frozen buckets - Splunk deletes these, but you can archive their contents first.
You set the locations of index databases in
indexes.conf. (See below for detailed information on the database locations for the default index.) You also specify numerous other attributes there, such as the maximum size and age of hot buckets. See "How Splunk stores indexes" for detailed information on buckets and
Locations of the index database directories
Here's the directory structure for the default index (
|Bucket type||Default location||Notes|
||There can be multiple hot subdirectories. Each hot bucket occupies its own subdirectory, which uses this naming convention:
||There are multiple warm subdirectories. Each warm bucket occupies its own subdirectory, which uses this naming convention:
The timestamps are expressed in UTC epoch time (in seconds). For example:
||There are multiple cold subdirectories. When warm buckets roll to cold, they get moved into this directory, but are not renamed.|
|Frozen||N/A: Data deleted or archived into a directory location you specify.||Deletion is the default; see "Archive indexed data" for information on how to archive the data instead.|
||Location for data that has been archived and later thawed. See "Restore archived data" for information on restoring archived data to a "thawed" state.|
The paths for hot/warm and cold directories are configurable, so you can store cold buckets in a separate location from hot/warm buckets. See "Use multiple partitions for index data".
Caution: All index locations must be writable.
Choose your backup strategy
There are two basic backup scenarios to consider:
- Ongoing, incremental backups of warm data
- Backup of all data - for example, before upgrading Splunk
How you actually perform the backup will, of course, depend entirely on the tools and procedures in place at your organzation, but this section should help provide you the guidelines you need to proceed.
The general recommendation is to schedule backups of any new warm buckets regularly, using the incremental backup utility of your choice. If you're rolling buckets frequently, you should also include the cold database directory in your backups, to ensure that you don't miss any buckets that have rolled to cold before they've been backed up. Since bucket directory names don't change when they roll from warm to cold, you can just filter by name.
To back up hot buckets as well, you need to take a snapshot of the files, using a tool like VSS (on Windows/NTFS), ZFS snapshots (on ZFS), or a snapshot facility provided by the storage subsystem. If you do not have a snapshot tool available, you can manually roll a hot bucket to warm and then back it up, as described below. However, this is not generally recommended, for reasons also discussed below.
Back up all data
It is recommended that you back up all your data before upgrading Splunk. This means the hot, warm, and cold buckets.
There are obviously a number of ways to do this, depending on the size of your data and how much downtime you can afford. Here are some basic guidelines:
- For smaller amounts of data, shut down Splunk and just make a copy of your database directories before performing the upgrade.
- For larger amounts of data, you will probably instead want to snapshot your hot buckets prior to upgrade.
In any case, if you have been doing incremental backups of your warm buckets as they've rolled from hot, you should really only need to backup only your hot buckets at this time.
Rolling buckets manually from hot to warm
To roll the buckets of an index manually from hot to warm, use the following CLI command, replacing
<index_name> with the name of the index you want to roll:
./splunk _internal call /data/indexes/<index_name>/roll-hot-buckets –auth <admin_username>:<admin_password>
Important: It is ordinarily not advisable to roll hot buckets manually, as each forced roll permanently decreases search performance over the data. As a general rule, larger buckets are more efficient to search. By prematurely rolling buckets, you're producing smaller, less efficient buckets. In cases where hot data needs to be backed up, a snapshot backup is the preferred method.
Recommendations for recovery
If you experience a non-catastrophic disk failure (for example you still have some of your data, but Splunk won't run), Splunk recommends that you move the index directory aside and restore from a backup rather than restoring on top of a partially corrupted datastore. Splunk will automatically create hot directories on startup as necessary and resume indexing. Monitored files and directories will pick up where they were at the time of the backup.
Index backup strategy
For an end-to-end procedure that ensures all data in your index gets backed up on a daily basis, read this Splunk blog: Index backup strategy.
How much space you will need
Back up configuration information