Archive Splunk indexes using the configuration files

Before you begin, note the following:

Splunk must be installed using the same user for all indexers and the Hunk Splunk Enterprise instances. This is the user which connects to HDFS for archiving and the user and user permissions must be consistent.
The data in the referring Splunk Index must be in a warm or cold bucket in order to be archived.
The Hunk Search Head and each Splunk indexer must have Hadoop client libraries and Java libraries in the same location. See System and software requirements in this manual for updated information about the required versions.
The Splunk user associated with the Splunk indexer must have permission to write to the HDFS node.
Hunk cannot currently archive buckets with raw data larger than 5GB to S3. You can configure your Splunk Enterprise bucket sizes in indexes.conf. See Archiving Splunk indexes to S3 in this manual for known issues when archiving to S3.

Configure Splunk Index archiving in the configuration file

In indexes.conf, configure the following stanza:

[splunk_index_archive]
vix.output.buckets.from.indexes = <the exact name of the Splunk index you want to copy into archives, for example: "splunk_index">
vix.output.buckets.older.than = <the age (in seconds) at which bucket data in the Splunk Index should be archived into Hunk. For example: 432000 (5 days). Note that Splunk does delete data after a while, based on index settings, so make sure that this setting rolls the Splunk buckets before they are deleted by Splunk.
>
vix.output.buckets.path = <an absolute path to the directory in HDFS where the archived bucket will be stored. For example: <code>/user/root/archive/splunk_index_archive</code>. For S3, prefix this path with s3n://<s3-bucket>/
vix.provider = <the virtual index provider for the new Hunk archive. For example: "hunkprovider"

Where:

vix.output.buckets.from.indexes is the exact name of the Splunk index you want to copy into an archive. For example: "splunk_index." You can list multiple Splunk indexes separated by commas.

vix.output.buckets.older.than is the age at which bucket data in the Splunk Index should be archived into Hunk. For example, if you specify 432000 seconds (5 days), data will be copied into the Hunk archive when it is five days old. Note that Splunk does delete data after a while, based on index settings, so make sure that this setting copies the Splunk data before it is deleted in the Splunk Enterprise indexer.

vix.output.buckets.path is the directory in HDFS where the archive bucket should be stored. For example: "/user/root/archive/splunk_index_archive". If you are using S3, you should prefix this value with s3n://<s3-bucket>/ and add the additional attributes from the code example below.

vix.provider is the virtual index provider for the new Hunk archive.

For S3 directories you must prefix vix.output.buckets.path with s3n://<s3-bucket>/ and then add the following additional attributes to the provider stanza:

vix.fs.s3n.awsAccessKeyId = <your aws access key ID>
vix.fs.s3n.awsSecretAccessKey = <your aws secret access key>

Limit the bandwidth Hunk uses for archiving

You can set bandwidth throttling to limit the transfer rate of your archives.

You set throttling for a provider, that limit is then applied across all archives assigned to that provider. To configure throttling, add the following attribute under the virtual index provider stanza you want to throttle.

vix.output.buckets.max.network.bandwidth = <bandwidth in bits/second>

For more about configuring a provider in indexes.conf see Configure a virtual index.

Related answers from Splunk Community

Archive Splunk indexes using the configuration files

Configure Splunk Index archiving in the configuration file

Limit the bandwidth Hunk uses for archiving

Comments

Archive Splunk indexes using the configuration files

Was this topic useful?