About archiving Splunk indexes

Archive Splunk indexed data into HDFS or S3 so that you can:

Search archived data that is no longer available in Splunk.
Search across archived buckets, virtual indexes, and Splunk Enterprise indexes.
Perform batch processing analysis in Hunk that includes Splunk Enterprise archived data.
Archive Splunk Enterprise indexer data to meet your data retention policies without using valuable Splunk indexer space.

The Archive feature provides a user-friendly way for you to copy warm and cold Splunk indexer data to Hunk as archived data.

Setting it up

To configure archiving, you tell Hunk:

Which Splunk Enterprise indexes to archive into Hunk.
Where to put the archived data in HDFS.
At what age Splunk Enterprise buckets should be copied to the archive in HDFS.

Hunk provides two ways for you to configure the above information:

How archiving works

Once you configure a Splunk index as a Hunk Archive:

The splunk_archiver app uses Bundle Replication to distribute your configuration information to all relevant Splunk Enterprise indexers.
Every 17 minutes after the hour, Hunk automatically runs the command | archivebuckets, which will start the archiving process on each indexer.
Hunk copies cold and warm bucket data from the Splunk Enterprise indexers to a Hadoop supported file system, such as HDFS or S3.
Archived buckets are ready to be searched in Hunk.

Searching archived indexes

You can search archived buckets as you normally search HDFS or S3 in Hunk, simply including the archive virtual index in your searches. See Using search commands on a virtual index.

You can for example, create one search that searches Splunk for:

Data in a Splunk Enterprise index, including data that has not been archived
Archived data copied into HDFS or S3 (that may or may not be stored in Splunk Enterprise).
Virtual index data (Hunk).

Here's an example search that gets the event count by source from a Splunk Enterprise index, "_internal", an archived index, "main_archive", and a virtual index, "vix".

index=_internal OR index=main_archive OR index=vix | stats count by source

Search performance

When you search archives, Hunk performs batch searches on archived data, which is usually much slower than searches of indexed data in Splunk. Since Splunk deletes cold data based on your Splunk Enterprise indexes.conf settings, archived data may or may not be also present in Splunk Enterprise indexes. It is important to be familiar with your archive and Splunk indexer retention policies and settings so that if you are looking for specific data that is still in Splunk, you can run more efficient searches.

To improve search time when searching archives, you can use dates to limit the buckets that are searched. The storage path in which Hunk archives the Splunk Indexer data includes the earliest time and the latest time of the buckets. So when you search within a certain time, Hunk is able to use that information to narrow searches to relevant buckets, rather than searching through the entire virtual index.

Related answers from Splunk Community

About archiving Splunk indexes

Setting it up

How archiving works

Searching archived indexes

Search performance

Comments

About archiving Splunk indexes

Was this topic useful?