Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

Download topic as PDF

Search indexed data archived to Hadoop

Once you properly install and configure your archive indexes, you can create reports and visualize data as you would against data in a traditional Splunk index. Using virtual indexes alongside traditional Splunk Enterprise indexes, you can gather data from the virtual index alone; or you can query both local and virtual indexes for a single report.

For the most part, you can create reports for virtual indexes much as you would for local indexes. For more information about creating reports, see the Splunk Enterprise Search Manual.

Since events are not sorted, any search command which depends on implicit time order will not work exactly the way you'd expect. (For example: head, delta, or transaction.) This means that a few search commands operate differently when used on virtual indexes, mostly because of the way Hadoop reports timestamps.

You can still use these commands, and may particularly want to when creating a single report for local and virtual indexes, but you should be aware of how they operate and return data differently.

Search language

For the most part, you can use Splunk Enterprise search language to create your reports. However, because Hadoop does not support strict requirements on the order of events, there are a few differences.

The following commands are not supported when the search includes an archived index:

  • transactions
  • localize

The following commands work on archived indexes, but their results may differ from Splunk. This is because in Hadoop, descending time order of events is not guaranteed:

  • streamstats
  • head
  • delta
  • tail
  • reverse
  • eventstats
  • dedup (Since the command cannot distinguish order within an HDFS directory to pick the item to remove, Splunk Analytics for Hadoop will choose the item to remove based on modified time, or file order.)

Distributable and non-distributable commands in archives

Distributable search commands are the most effective commands Hadoop Data Roll reports because they can be distributed to search peers and archive indexes. Generally, non-distributable commands only work on local indexes and are not as effective on archived indexes.

You can create searches across different index types that use both distributable and non-distributable commands as long as you keep in mind that these such a search returns all data from the local indexes but limited data from the virtual indexes.

Header extractions to avoid when working with virtual indexes

Archived indexes do not support configuration of index time fields. Therefore properties specific to index-time field extractions do not apply to archive indexes. This includes the following properties:

  • INDEXED_EXTRACTIONS
  • HEADER_FIELD_LINE_NUMBER
  • PREAMBLE_REGEX
  • FIELD_HEADER_REGEX
  • FIELD_DELIMITER
  • FIELD_QUOTE
  • HEADER_FIELD_DELIMITER
  • HEADER_FIELD_QUOTE
  • TIMESTAMP_FIELDS = field1,field2,...,fieldn
  • FIELD_NAMES
  • MISSING_VALUE_REGEX
PREVIOUS
Archive Splunk indexes to Hadoop on S3
  NEXT
Archive cold buckets to frozen in Hadoop

This documentation applies to the following versions of Splunk® Enterprise: 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4


Comments

I think this sentence is wrong:

Distributable search commands are the most effective commands Hadoop Data Roll reports because they can be distributed to search heads and archive indexes.

I believe it should say: distributed to search peers

not search heads

Sjohnson splunk, Splunker
October 12, 2016

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters