Install Hadoop CLI
Splunk Hadoop Connect communicates with Hadoop clusters through the Hadoop Distributed File System (HDFS) Command-Line Interface, or Hadoop CLI. Before you deploy Hadoop Connect, install Hadoop CLI on each Splunk instance that you want to run Hadoop Connect.
For information on the Hadoop CLI, see "Hadoop Commands Guide" on the Apache Hadoop documentation.
You can configure Splunk Hadoop Connect to communicate with multiple Hadoop clusters of differing distributions and versions. Therefore, you can install multiple Hadoop CLI packages on a single Splunk instance.
Collect Hadoop environment information
For each Hadoop cluster you want to connect to, have the following information.
- Hadoop distribution and version.
- HDFS Namenode Uniform Resource Identifier (URI).
- Namenode HTTP port.
- Namenode IPC port.
- Whether the cluster requires secure authentication.
Verify Java version
Hadoop CLI requires Oracle Java 6u31 or later. Before you install Hadoop CLI, verify that Oracle Java 6u31 or later is installed on each Splunk instance in which you plan to run Splunk Hadoop Connect.
Download and install the Oracle Java Development Kit (JDK)
Install the correct Java version.
1. Download the recommended Oracle Standard Edition (SE) JDK from the Oracle Java SE downloads site (http://www.oracle.com/technetwork/java/javase/downloads/index.html).
Go to the archive page located at http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html
Important: Download the JDK, which includes the JRE.
2. Follow the installation instructions on the Oracle web site for Java SE to install the JDK onto your system.
- For Java SE version 6, use "Java SE 6 Platform Installation."
- For Java SE version 7, use "JDK 7 and JRE 7 Installation Guide."
Download the Hadoop package
After you know the specific Hadoop distribution and version for each Hadoop cluster in your environment, determine the correct Hadoop CLI tar file to download.
Download Apache Hadoop
1. Go to the Apache download archive site:
2. Select the correct tar file for your version of Apache Hadoop. For example, version 1.0.3:
Cloudera Distribution including Apache Hadoop (Cloudera CDH)
1. Go to the CDH downloads site: https://www.cloudera.com/downloads/cdh/5-12-1.html
Or use the CDH archives:
2. Locate the correct CDH version, and click the Tarball Download link.
3. Find the correct hadoop-<version> in the Component column, and click Download.
Hortonworks Data Platform (HDP)
1. Go to the Hortonworks Data Platform Release Repository:
2. Select the correct HDP version.
3. Navigate to and download the associated tar file (tar.gz). For example:
Note: If your system has the
wget binaries installed, you can also pull Hadoop CLI packages directly from the Splunk instance with the
wget command. For example:
Extract the Hadoop package
To install the Hadoop CLI package, open the archive you downloaded. For example, run the command:
tar -xvzf <archive_name>.tar.gz
Download and extract the correct Hadoop CLI for each Hadoop cluster that Splunk Hadoop Connect communicates with. If you have multiple distributions and versions of Hadoop in your environment, install multiple Hadoop CLI packages on one Splunk instance.
Test the Hadoop setup
Test your Hadoop CLI installation to make sure that:
- There is network connectivity between your Splunk instance and your Hadoop environment.
- The Hadoop utilities are unpacked and installed correctly.
- The CLI can properly run Java.
Test network connectivity
To test that your Hadoop CLI is set up properly and can connect to your Hadoop cluster, run the command:
$HADOOP_HOME/bin/hadoop fs -ls <namenode>:<ipc_port>/
namenodeis the HDFS NameNode of your Hadoop cluster.
ipc_portis the inter-process communications (IPC) port that your Hadoop cluster listens on.
If Hadoop CLI returns a directory listing and does not present an error message, then your setup is correct and you have a successful connection.
Test write access to the Hadoop cluster
To test write access to your Hadoop cluster, run this command in the path where you want to export data:
$HADOOP_HOME/bin/hadoop fs -touchz <namenode>:<ipc_port>/<dir_path>/foo.txt
$HADOOP_HOME/bin/hadoop fs -rm <namenode>:<ipc_port>/<dir_path>/foo.txt
If Hadoop CLI does not return an error message, then your setup is correct.
Download and install Splunk Hadoop Connect
Install Kerberos client utilities
This documentation applies to the following versions of Splunk® Hadoop Connect (Legacy): 1.0, 1.1, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5