System and software requirements
Make sure you have access to at least one Hadoop cluster (with data in it) and the ability to run MapReduce jobs on that data.
Splunk Analytics for Hadoop is supported on the following Hadoop distributions and versions:
- Apache Hadoop 3.2.1
- Open Apache 3.1.2
- Cloudera Distribution including Apache Hadoop v6.3
- Hortonworks Data Platform (HDP) 3.1.4
- MapR 6.1
What you need on your Hadoop nodes
On Hadoop TaskTracker nodes you need a directory on the *nix file system running your Hadoop nodes that meets the following requirements:
- One gigabyte of free disk space for a copy of Splunk.
- 5-10GB of free disk space for temporary storage. This storage is used by the search processes.
What you need on your Hadoop file system
On your Hadoop file system (HDFS or otherwise) you will need:
- A subdirectory under
jobtracker.staging.root.dir(usually /user/) with the name of the user account under which Splunk Analytics for Hadoop is running on the search head. For example, if Splunk Analytics for Hadoop is started by user "BigDataUser" and
jobtracker.staging.root.dir=/user/you need a directory
/user/HadoopAnalyticsthat is accessible by user "BigDataUser".
- A subdirectory under the above directory that can be used by this server for intermediate storage, such as
Learn more and get help
Ensure compatibility with Splunk Enterprise and Hadoop
This documentation applies to the following versions of Splunk® Enterprise: 7.3.4, 7.3.5, 7.3.6, 7.3.7, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.1.0