Splunk® Enterprise

Splunk Analytics for Hadoop

Splunk Enterprise version 7.1 is no longer supported as of October 31, 2020. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.
This documentation does not apply to the most recent version of Splunk® Enterprise. For documentation on the most recent version, go to the latest release.

Working with Hive and Parquet data

Splunk Analytics for Hadoop reaches End of Life on January 31, 2025.

Data Preprocessors

When Splunk Analytics for Hadoop initializes a search for non-HDFS input data, it uses the information contained in the FileSplitGenerator class to determine how to split data for parallel processing.

The default FileSplitGenerator contains the same data split logic defined in Hadoop's FileInputFormat This means that it works for any data format that can be read by Hadoop's InputFormat implementation (which has same split logic as FileInputFormat).

Since the default FileSplitGenerator does not work for Hive or Parquet files, Splunk Analytics for Hadoop provides HiveSplitGenerator and ParquetSplitGenerator for Hive and Parquet. Any custom Hive files with file-based split logic (such as files created with Hadoop FileOutputFormat and its subclasses) works with the HiveSplitGenerator. If you have custom Hive file formats that do not use file-based data split logic, you can implement a custom SplitGenerator that uses your split logic.

Parquet files created by all tools (including Hive) work with (and only with) ParquetSplitGenerator.

Last modified on 30 October, 2023
Configure Splunk Analytics for Hadoop to read Hadoop Archive (HAR) files   Configure Hive connectivity

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters