Splunk® Enterprise

Admin Manual

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Archive indexed data

You can set up Splunk to archive your data automatically as it ages; specifically, at the point when it rolls to "frozen". To do this, you configure indexes.conf.

Edit a copy of indexes.conf in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/. Do not edit the copy in $SPLUNK_HOME/etc/system/default. For information on configuration files and directory locations, see "About configuration files".

Caution: By default, Splunk deletes all frozen data. It removes the data from the index at the moment it becomes frozen. If you need to keep the data around, you must configure Splunk to archive the data before removing it. You do this by either setting the coldToFrozenDir attribute or specifying a valid coldToFrozenScript in indexes.conf.

For detailed information on data storage in Splunk, see "How Splunk stores indexes".

Sign your archives

Splunk supports archive signing; configuring this allows you to verify integrity when you restore an archive.

Note: To use archive signing, you must specify a custom archiving script; you cannot perform archive signing if you choose to let Splunk perform the archiving automatically.

How Splunk archives data

Splunk rotates old data out of the index based on your data retirement policy. Data moves through several stages, which correspond to file directory locations. Data starts out in the hot database, located as subdirectories ("buckets") under $SPLUNK_HOME/var/lib/splunk/defaultdb/db/. It then moves to the warm database, also located as subdirectories under $SPLUNK_HOME/var/lib/splunk/defaultdb/db. Eventually, data is aged into the cold database $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb.

Finally, data reaches the frozen state. This can happen for a number of reasons, the main one being when the data becomes older than frozenTimePeriodInSecs (set in indexes.conf). At this point, Splunk erases the data from the index. If you want Splunk to archive the frozen data before erasing it from the index, you must specify that behavior in indexes.conf. You can choose two ways of handling the archiving:

The archiving behavior depends on which of these attributes you set:

  • coldToFrozenDir. This attribute specifes a location where Splunk will automatically archive frozen data.
  • coldToFrozenScript. This attribute specifes a script that Splunk will run when the data is frozen. Typically, this will be a script that archives the frozen data. The script can also serve some other purpose altogether. While Splunk ships with one example archiving script that you can edit and use ($SPLUNK_HOME/bin/coldToFrozenExample.py), you can actually specify any script you want Splunk to run.

Note: You can only set one or the other of these attributes. The coldToFrozenDir attribute takes precedence over coldToFrozenScript, if both are set.

If you don't specify either of these attributes, Splunk runs a default script that simply writes the name of the bucket being erased to the log file $SPLUNK_HOME/var/log/splunk/splunkd_stdout.log. It then erases the bucket.

Let Splunk archive the data for you

If you set the coldToFrozenDir attribute in indexes.conf, Splunk will automatically copy frozen buckets to the specified location before erasing the data from the index.

Add this stanza to $SPLUNK_HOME/etc/system/local/indexes.conf:

[<index>]
coldToFrozenDir = "<path to frozen archive>"

Note the following:

  • <index> specifies which index contains the data to archive.
  • <path to frozen archive> specifies the directory where Splunk will put the archived buckets.

Note: When you use Splunk Web to create a new index, you can also specify a frozen archive path for that index. See "Set up multiple indexes" for details.

How Splunk archives the frozen data depends on whether the data was originally indexed by pre-4.2 Splunk:

  • For buckets created from version 4.2 and on, Splunk will remove all files except for the rawdata file.
  • For pre-4.2 buckets, the script simply gzip's all the .tsidx and .data files in the bucket.

This difference is due to a change in the format of rawdata. Starting with 4.2, the rawdata file contains all the information Splunk needs to reconstitute an index bucket.

For information on thawing these buckets, see "Restore archived indexed data".

Specify an archiving script

If you set the coldToFrozenScript attribute in indexes.conf, the script you specify will run just before Splunk erases the frozen data from the index.

You'll need to supply the actual script. Typically, the script will archive the data, but you can provide a script that performs any action you want.

Add this stanza to $SPLUNK_HOME/etc/system/local/indexes.conf:

[<index>]
coldToFrozenScript = ["<path to program that runs script>"] "<path to script>"

Note the following:

  • <index> specifies which index contains the data to archive.
  • <path to script> specifies the path to the archiving script. The script must be in $SPLUNK_HOME/bin or one of its subdirectories.
  • <path to program that runs script> is optional. You must set it if your script requires a program, such as python, to run it.
  • If your script is located in $SPLUNK_HOME/bin and is named myColdToFrozen.py, set the attribute like this:
        coldToFrozenScript = "$SPLUNK_HOME/bin/python" "$SPLUNK_HOME/bin/myColdToFrozen.py"

Splunk ships with an example archiving script that you can edit: $SPLUNK_HOME/bin/coldToFrozenExample.py.

Note: If using the example script, edit it to specify the archive location for your installation. Also, rename the script or move it to another location to avoid having changes overwritten when you upgrade Splunk. This is an example script and should not be applied to a production instance without editing to suit your environment and testing extensively.

The example script archives the frozen data differently, depending on whether the data was originally indexed with pre-4.2 Splunk:

  • For buckets created from version 4.2 and on, it will remove all files except for the rawdata file.
  • For pre-4.2 buckets, the script simply gzip's all the .tsidx and .data files.

This difference is due to a change in the format of rawdata. Starting with 4.2, the rawdata file contains all the information Splunk needs to reconstitute an index bucket.

For information on thawing these buckets, see "Restore archived indexed data".

As a best practice, make sure the script you create completes as quickly as possible, so that Splunk doesn't end up waiting for the return indicator. For example, if you want to archive to a slow volume, set the script to copy the buckets to a temporary location on the same (fast) volume as the index. Then use a separate script, outside Splunk, to move the buckets from the temporary location to their destination on the slow volume.

PREVIOUS
Set a retirement and archiving policy
  NEXT
Restore archived indexed data

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7


Comments

Why aren't these kind of actions configurable from the web interface? This seems like a very common thing people would want to configure...

Enkrypter
November 5, 2012

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters