Search indexed data archived to Hadoop

Once you properly install and configure your archive indexes, you can create reports and visualize data as you would against data in a traditional Splunk index. Using virtual indexes alongside traditional Splunk Enterprise indexes, you can gather data from the virtual index alone; or you can query both local and virtual indexes for a single report.

For the most part, you can create reports for virtual indexes much as you would for local indexes. For more information about creating reports, see the Splunk Enterprise Search Manual.

Since events are not sorted, any search command which depends on implicit time order will not work exactly the way you'd expect. (For example: head, delta, or transaction.) This means that a few search commands operate differently when used on virtual indexes, mostly because of the way Hadoop reports timestamps.

You can still use these commands, and may particularly want to when creating a single report for local and virtual indexes, but you should be aware of how they operate and return data differently.

Search language

For the most part, you can use Splunk Enterprise search language to create your reports. However, because Hadoop does not support strict requirements on the order of events, there are a few differences.

The following commands are not supported when the search includes an archived index:

transactions
localize

The following commands work on archived indexes, but their results may differ from Splunk. This is because in Hadoop, descending time order of events is not guaranteed:

streamstats
head
delta
tail
reverse
eventstats
dedup (Since the command cannot distinguish order within an HDFS directory to pick the item to remove, Splunk Analytics for Hadoop will choose the item to remove based on modified time, or file order.)

Distributable and non-distributable commands in archives

Distributable search commands are the most effective commands Hadoop Data Roll reports because they can be distributed to search peers and archive indexes. Generally, non-distributable commands only work on local indexes and are not as effective on archived indexes.

You can create searches across different index types that use both distributable and non-distributable commands as long as you keep in mind that these such a search returns all data from the local indexes but limited data from the virtual indexes.

Header extractions to avoid when working with virtual indexes

Archived indexes do not support configuration of index time fields. Therefore properties specific to index-time field extractions do not apply to archive indexes. This includes the following properties:

INDEXED_EXTRACTIONS
HEADER_FIELD_LINE_NUMBER
PREAMBLE_REGEX
FIELD_HEADER_REGEX
FIELD_DELIMITER
FIELD_QUOTE
HEADER_FIELD_DELIMITER
HEADER_FIELD_QUOTE
TIMESTAMP_FIELDS = field1,field2,...,fieldn
FIELD_NAMES
MISSING_VALUE_REGEX

Related answers from Splunk Community

Search indexed data archived to Hadoop

Search language

Distributable and non-distributable commands in archives

Header extractions to avoid when working with virtual indexes

Comments

Search indexed data archived to Hadoop

Was this topic useful?