About virtual indexes
Virtual indexes let Splunk Analytics for Hadoop address data stored in external systems and push computations to those systems. With virtual indexes you can access and report on structured, unstructured and polystructured data residing within your Hadoop cluster.
Splunk Analytics for Hadoop leverages the MapReduce framework to execute report-generating searches on Hadoop nodes. Data does not need to be pre-processed before it is accessed because Splunk Analytics for Hadoop lets you run analytics searches against the data where it rests in Hadoop.
Splunk Analytics for Hadoop treats virtual indexes as read-only data stores and binds a schema to the data at search time. This means the data you report on remains accessible in the same format as before to other systems and tools that use it, such as Hive and Pig.
Configuring virtual indexes
Before you set up a virtual index, you set up providers. When you configure a provider, you tell Splunk Analytics for Hadoop details about your Hadoop cluster, which the ERP process uses to carry out reporting tasks. An ERP is a search helper process that we've created to carry out searches on Hadoop data.
You then configure virtual indexes by giving Splunk Analytics for Hadoop information about your Hadoop data, such as the data location, a set of allowed and blocked files or directories. When properly configured, virtual indexes recognize certain directory structures and extract and use that information to optimize searches. For example, if your data is partitioned in a directory structure using dates, then Splunk Analytics for Hadoop can reduce the amount of data it processes by properly choosing to process only the data in relevant paths.
- To configure your providers and virtual indexes using the CLI, see Set up a provider and virtual index.
- To set up new providers in Splunk Web, see Add or edit an HDFS provider.
- To set new virtual indexes in the Splunk Web, see Add or edit a virtual index in Splunk Web.
Special instructions for upgrades from Hunk to Splunk Analytics for Hadoop
Set up a provider and virtual index in the configuration file
This documentation applies to the following versions of Splunk® Enterprise: 6.5.7, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4
Feedback submitted, thanks!