Search indexed data archived to Hadoop
Once you properly install and configure your archive indexes, you can create reports and visualize data as you would against data in a traditional Splunk index. Using virtual indexes alongside traditional Splunk Enterprise indexes, you can gather data from the virtual index alone; or you can query both local and virtual indexes for a single report.
For the most part, you can create reports for virtual indexes much as you would for local indexes. For more information about creating reports, see the Splunk Enterprise Search Manual.
Since events are not sorted, any search command which depends on implicit time order will not work exactly the way you'd expect. (For example: head, delta, or transaction.) This means that a few search commands operate differently when used on virtual indexes, mostly because of the way Hadoop reports timestamps.
You can still use these commands, and may particularly want to when creating a single report for local and virtual indexes, but you should be aware of how they operate and return data differently.
For the most part, you can use Splunk Enterprise search language to create your reports. However, because Hadoop does not support strict requirements on the order of events, there are a few differences.
The following commands are not supported when the search includes an archived index:
The following commands work on archived indexes, but their results may differ from Splunk. This is because in Hadoop, descending time order of events is not guaranteed:
dedup(Since the command cannot distinguish order within an HDFS directory to pick the item to remove, Splunk Analytics for Hadoop will choose the item to remove based on modified time, or file order.)
Distributable and non-distributable commands in archives
Distributable search commands are the most effective commands Hadoop Data Roll reports because they can be distributed to search peers and archive indexes. Generally, non-distributable commands only work on local indexes and are not as effective on archived indexes.
You can create searches across different index types that use both distributable and non-distributable commands as long as you keep in mind that these such a search returns all data from the local indexes but limited data from the virtual indexes.
Header extractions to avoid when working with virtual indexes
Archived indexes do not support configuration of index time fields. Therefore properties specific to index-time field extractions do not apply to archive indexes. This includes the following properties:
- TIMESTAMP_FIELDS = field1,field2,...,fieldn
Archive Splunk indexes to Hadoop on S3
Archive cold buckets to frozen in Hadoop
This documentation applies to the following versions of Splunk® Enterprise: 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.3.0, 7.3.1, 7.3.2, 8.0.0