Configure Splunk index archiving to Hadoop using the configuration files

Before you begin, note the following:

You must configure a Hadoop provider.
Splunk must be installed using the same user for all indexers and Splunk Enterprise instances. This is the user which connects to HDFS for archiving and the user and user permissions must be consistent.
The data in the referring Index must be in warm, cold, or frozen buckets only.
The Hadoop client libraries must be in the same location on each indexer. Likewise, the Java Runtime Environment must be installed in the same location on each indexer. See System and software requirements for updated information about the required versions.
The Splunk user associated with the Splunk indexer must have permission to write to the HDFS node.
Splunk cannot currently archive buckets with raw data larger than 5GB to S3. You can configure your Splunk Enterprise bucket sizes in indexes.conf. See Archiving Splunk indexes to S3 in this manual for known issues when archiving to S3.

Set bundle deletion parameters

Use the following attribute to specify how many bundles may accrue before Splunk Enterprise deletes them:

vix.splunk.setup.bundle.reap.limit = 5

The default value is 5, which means that when there are more than five bundles, Splunk Enterprise will delete the oldest one.

Configure index archiving in the configuration file

In indexes.conf, configure the following stanza:

[splunk_index_archive]
vix.output.buckets.from.indexes
vix.output.buckets.older.than
vix.output.buckets.path
vix.provider

Where:

vix.output.buckets.from.indexes is the exact name of the Splunk index you want to copy into an archive. For example: "splunk_index." You can list multiple Splunk indexes separated by commas.

vix.output.buckets.older.than is the age at which bucket data in the Splunk Index should be archived. For example, if you specify 432000 seconds (5 days), data will be copied into the archive when it is five days old. Note that Splunk does delete data after a while, based on index settings, so make sure that this setting copies the Splunk data before it is deleted in the Splunk Enterprise indexer.

vix.output.buckets.path is the directory in HDFS where the archive bucket should be stored. For example: "/user/root/archive/splunk_index_archive". If you are using S3, you should prefix this value with s3n://<s3-bucket>/ and add the additional attributes from the code example below.

vix.provider is the virtual index provider for the new archive.

For S3 directories you must prefix vix.output.buckets.path with s3n://<s3-bucket>/ and then add the following additional attributes to the provider stanza:

vix.fs.s3n.awsAccessKeyId = <your aws access key ID>
vix.fs.s3n.awsSecretAccessKey = <your aws secret access key>

Limit the bandwidth used for archiving

You can set bandwidth throttling to limit the transfer rate of your archives.

You set throttling for a provider, that limit is then applied across all archives assigned to that provider. To configure throttling, add the following attribute under the virtual index provider stanza you want to throttle.

vix.output.buckets.max.network.bandwidth = <bandwidth in bits/second>

For more about configuring a provider in indexes.conf see Add or edit a provider in Splunk Web.

Related answers from Splunk Community

Configure Splunk index archiving to Hadoop using the configuration files

Set bundle deletion parameters

Configure index archiving in the configuration file

Limit the bandwidth used for archiving

Comments

Configure Splunk index archiving to Hadoop using the configuration files

Was this topic useful?