Splunk® Enterprise

Splunk Analytics for Hadoop

Acrobat logo Download manual as PDF

Acrobat logo Download topic as PDF

About virtual indexes

Splunk Analytics for Hadoop reaches End of Life on January 31, 2025.

Virtual indexes let Splunk Analytics for Hadoop address data stored in external systems and push computations to those systems. With virtual indexes you can access and report on structured, unstructured and polystructured data residing within your Hadoop cluster.

Splunk Analytics for Hadoop leverages the MapReduce framework to execute report-generating searches on Hadoop nodes. Data does not need to be pre-processed before it is accessed because Splunk Analytics for Hadoop lets you run analytics searches against the data where it rests in Hadoop.

Splunk Analytics for Hadoop treats virtual indexes as read-only data stores and binds a schema to the data at search time. This means the data you report on remains accessible in the same format as before to other systems and tools that use it, such as Hive and Pig.

Configuring virtual indexes

Before you set up a virtual index, you set up providers. When you configure a provider, you tell Splunk Analytics for Hadoop details about your Hadoop cluster, which the ERP process uses to carry out reporting tasks. An ERP is a search helper process that we've created to carry out searches on Hadoop data.

You then configure virtual indexes by giving Splunk Analytics for Hadoop information about your Hadoop data, such as the data location, a set of allowed and blocked files or directories. When properly configured, virtual indexes recognize certain directory structures and extract and use that information to optimize searches. For example, if your data is partitioned in a directory structure using dates, then Splunk Analytics for Hadoop can reduce the amount of data it processes by properly choosing to process only the data in relevant paths.

Learn more

Last modified on 30 October, 2023
Special instructions for upgrades from Hunk to Splunk Analytics for Hadoop
Set up a provider and virtual index in the configuration file

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.1.0, 9.1.1, 9.1.2

Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters