Splunk® Enterprise

Splunk Analytics for Hadoop

Splunk Enterprise version 7.1 is no longer supported as of October 31, 2020. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.
This documentation does not apply to the most recent version of Splunk® Enterprise. For documentation on the most recent version, go to the latest release.

How Splunk Analytics for Hadoop returns reports on Hadoop data

Splunk Analytics for Hadoop reaches End of Life on January 31, 2025.

When a search is initiated, Splunk Analytics for Hadoop uses the Hadoop MapReduce framework to process the data in place. All of the data parsing, including source typing, event breaking, and time stamping, that is normally done at index time is performed in Hadoop at search time. Splunk Analytics for Hadoop does not index this data, instead it processes it on every request. Here's an overview of how Splunk Enterprise for Hadoop searches against Hadoop virtual indexes:

1. The user initiates a report-generated search on a virtual index. See Search a virtual index for more information about generating report-generated searches.

2. Splunk Analytics for Hadoop recognizes that the request is for a virtual index and spawns an External Results Provider (ERP) process to help with the request. An ERP is a search helper process that carries out searches on Hadoop data. See About virtual indexes.

3. Based on your configuration, Splunk Analytics for Hadoop passes configuration and run-time data, including the parsed search string etc, to the ERP in a JSON format.

4. If this is the first time a search is executed for a particular provider family, the ERP process sets up the necessary environment in HDFS by copying a Splunk Enterprise package and the knowledge bundles to your HDFS or NoSQL database.

5. The ERP process analyses the request from the search. It identifies the relevant data to be processed and generates tasks to be executed on Hadoop. It then spawns a MapReduce job to perform the computation.

6. For each task, the MapReduce job first makes sure that the environment is up-to-date by checking for the correct Splunk package and knowledge bundle.

7. If the correct package and knowledge bundle are not found, the task copies the Splunk package from HDFS (see step 4) then extracts it into the configured directory. It then copies the bundles from HDFS (see step 4) and expands them in the correct directory within the TaskTracker.

8. The map task spawns a search process on the TaskTracker node to handle all the data processing.

9. The map task feeds data to the search process and it consumes its output, which becomes the output of the map task. This output is stored in HDFS.

10. The ERP processes on the search head continuously polls HDFS to pick up the results and feeds them to the search process running on the search head.

11. The ERP search process on the search head uses these results to create the reports. The report is constantly updated as new data arrives.

Last modified on 30 October, 2023
Meet Splunk Analytics for Hadoop   Learn more and get help

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters