Configure Splunk index archiving to Hadoop using the configuration files
Before you begin, note the following:
- You must configure a Hadoop provider.
- Splunk must be installed using the same user for all indexers and Splunk Enterprise instances. This is the user which connects to HDFS for archiving and the user and user permissions must be consistent.
- The data in the referring Index must be in warm, cold, or frozen buckets only.
- The Hadoop client libraries must be in the same location on each indexer. Likewise, the Java Runtime Environment must be installed in the same location on each indexer. See System and software requirements for updated information about the required versions.
- The Splunk user associated with the Splunk indexer must have permission to write to the HDFS node.
- Splunk cannot currently archive buckets with raw data larger than 5GB to S3. You can configure your Splunk Enterprise bucket sizes in
indexes.conf. See Archiving Splunk indexes to S3 in this manual for known issues when archiving to S3.
Set bundle deletion parameters
Use the following attribute to specify how many bundles may accrue before Splunk Enterprise deletes them:
vix.splunk.setup.bundle.reap.limit = 5
The default value is 5, which means that when there are more than five bundles, Splunk Enterprise will delete the oldest one.
Configure index archiving in the configuration file
indexes.conf, configure the following stanza:
[splunk_index_archive] vix.output.buckets.from.indexes vix.output.buckets.older.than vix.output.buckets.path vix.provider
vix.output.buckets.from.indexesis the exact name of the Splunk index you want to copy into an archive. For example: "splunk_index." You can list multiple Splunk indexes separated by commas.
vix.output.buckets.older.thanis the age at which bucket data in the Splunk Index should be archived. For example, if you specify 432000 seconds (5 days), data will be copied into the archive when it is five days old. Note that Splunk does delete data after a while, based on index settings, so make sure that this setting copies the Splunk data before it is deleted in the Splunk Enterprise indexer.
vix.output.buckets.pathis the directory in HDFS where the archive bucket should be stored. For example: "/user/root/archive/splunk_index_archive". If you are using S3, you should prefix this value with
s3n://<s3-bucket>/and add the additional attributes from the code example below.
vix.provideris the virtual index provider for the new archive.
For S3 directories you must prefix
s3n://<s3-bucket>/ and then add the following additional attributes to the provider stanza:
vix.fs.s3n.awsAccessKeyId = <your aws access key ID> vix.fs.s3n.awsSecretAccessKey = <your aws secret access key>
Limit the bandwidth used for archiving
You can set bandwidth throttling to limit the transfer rate of your archives.
You set throttling for a provider, that limit is then applied across all archives assigned to that provider. To configure throttling, add the following attribute under the virtual index provider stanza you want to throttle.
vix.output.buckets.max.network.bandwidth = <bandwidth in bits/second>
For more about configuring a provider in
indexes.conf see Add or edit a provider in Splunk Web.
Add or edit an HDFS provider in Splunk Web
Archive Splunk indexes to Hadoop in Splunk Web
This documentation applies to the following versions of Splunk® Enterprise: 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.2.0