Manage report acceleration

Splunk Analytics for Hadoop reaches End of Life on January 31, 2025.

You can use report acceleration with any search that utilizes the Hadoop ERP. For more information about using report acceleration, see Use report acceleration in this manual.

The resulting report acceleration summaries exist in Splunk Analytics for Hadoop as files that you can manage and maintain.

Locating report acceleration files

The report acceleration files for the Hadoop ERP are stored on the configured Hadoop file system. The files are by default stored under the configured path for vix.splunk.home.hdfs in a child directory named cache.

You can change this location using parameter vix.splunk.search.cache.path for a virtual index provider with a value of an absolute path. A child directory named cache is automatically added under this configured path.

Here are two examples of where the files would end up, with and without the cache path configuration.

# Example 1, no cache path configuration:
# Index.conf, virtual index provider:
vix.splunk.home.hdfs = /user/sarah/hadoopanalytics_files

# Resulting report acceleration file location
/user/sarah/hadoopanalytics_files/cache

# Example 2, using cache path configuration:
# Index.conf, virtual index provider:
vix.splunk.home.hdfs = /user/sarah/hadoopanalytics_files
vix.splunk.search.cache.path = /var/everyone/hadoopanalytics

# Resulting report acceleration file location
/var/everyone/hadoopanalytics/cache

Inside of the report acceleration file structure, there's a file containing some information about the cache. This file can be useful for debugging and maintenance purposes. It currently resides on the file system at:

cache/<index>/<search_hash>/info.json

where index is the index this summary was created from and the search_hash is a hash of the summary id.

Manage file cleanup

Splunk Analytics for Hadoop periodically cleans your report acceleration files of expired search summary data. For example, if you use report acceleration on a search that goes back one year, summary data covering data older than a year is deleted as it expires.

In most situations, this will be adequate storage maintenance. However, summary data might not be deleted exactly as it expires because summaries are kept until all the summaries in the same file have expired. If you find that Splunk Analytics for Hadoop is not optimally maintaining your storage, you can tune your configuration to be more efficient.

You can help Splunk Analytics for Hadoop clean up more efficiently by breaking the data into buckets (e.g. based on date) that match parts of your virtual index path. Once the summaries are grouped by these buckets, Splunk Analytics for Hadoop simply deletes the buckets that fall outside that maintenance range.

Below is an example of a scenario where data is structured by year, month, and day:

/path/to/data/20131230/...
/path/to/data/20131231/...
/path/to/data/20140101/...
/path/to/data/20140102/...
...
/path/to/data/20140209/...
...

The virtual index paths for this data would be configured something like this:

vix.input.1.path = /path/to/data/...
vix.input.1.et.regex = /path/to/data/(\d+)
vix.input.1.et.format = yyyyMMdd
...

To split your summaries by year and month, you can specify a bucket like this:

vix.input.1.bucket.regex = /path/to/data/(\d{6}).*

or with multiple groups:

vix.input.1.bucket.regex = /path/to/data/(\d{4})(\d{2}).*

vix.input.N.bucket.regex uses the groups in the regex to decide which bucket a path belong to. The regex in our first example captures the 6 first digits of the directory that has the date as its name. Using this bucket regex, our summaries split by year and month. The second example uses multiple groups to get a bucket from the path. This is slightly different as the bucket will be assigned to a value joining the regex groups, with a dash as a separator.

Here are some examples of how files under paths get bucket values given regexes:

# Regex:
vix.input.1.bucket.regex = /path/to/data/(\d{6}).*

# Paths - Assigned bucket value:
/path/to/data/20131230/foo.txt - 201312
/path/to/data/123456789/bar.csv - 123456

Here's an example using multiple groups:

# Regex
vix.input.1.bucket.regex = /path/to/data/(\d{4})(\d{2}).*

# Paths - Assigned bucket value:
/path/to/data/20131230/foo.txt - 2013-12
/path/to/data/123456789/bar.csv - 1234-56

How you bucket your data is dependent upon how much data you want to keep in a summary. Too fine a grouping creates too many summary file folders. Too broad a grouping defeats the purpose of bucketing summaries to begin with.

If you are unsure where to start, we recommend that you try grouping by year and month to begin, and then fine-tune from there to see what works best for you.

Related answers from Splunk Community

Manage report acceleration

Locating report acceleration files

Manage file cleanup

Comments

Manage report acceleration

Was this topic useful?