Splunk® Hadoop Connect

Deploy and Use Splunk Hadoop Connect

Download manual as PDF

Download topic as PDF

Install Hadoop CLI

Splunk Hadoop Connect communicates with Hadoop clusters through the Hadoop Distributed File System (HDFS) Command-Line Interface, or Hadoop CLI. Before you deploy Hadoop Connect, install Hadoop CLI on each Splunk instance that you want to run Hadoop Connect.

For information on the Hadoop CLI, see "Hadoop Commands Guide" on the Apache Hadoop documentation.

You can configure Splunk Hadoop Connect to communicate with multiple Hadoop clusters of differing distributions and versions. Therefore, you can install multiple Hadoop CLI packages on a single Splunk instance.

Collect Hadoop environment information

For each Hadoop cluster you want to connect to, have the following information.

  • Hadoop distribution and version.
  • HDFS Namenode Uniform Resource Identifier (URI).
  • Namenode HTTP port.
  • Namenode IPC port.
  • Whether the cluster requires secure authentication.

Verify Java version

Hadoop CLI requires Oracle Java 6u31 or later. Before you install Hadoop CLI, verify that Oracle Java 6u31 or later is installed on each Splunk instance in which you plan to run Splunk Hadoop Connect.

Download and install the Oracle Java Development Kit (JDK)

Install the correct Java version.

1. Download the recommended Oracle Standard Edition (SE) JDK from the Oracle Java SE downloads site (http://www.oracle.com/technetwork/java/javase/downloads/index.html).

Go to the archive page located at http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html

Important: Download the JDK, which includes the JRE.

2. Follow the installation instructions on the Oracle web site for Java SE to install the JDK onto your system.

Download the Hadoop package

After you know the specific Hadoop distribution and version for each Hadoop cluster in your environment, determine the correct Hadoop CLI tar file to download.

Download Apache Hadoop

1. Go to the Apache download archive site:

http://archive.apache.org/dist/hadoop/core/

2. Select the correct tar file for your version of Apache Hadoop. For example, version 1.0.3:

http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz

Cloudera Distribution including Apache Hadoop (Cloudera CDH)

1. Go to the CDH downloads site: https://www.cloudera.com/downloads/cdh/5-12-1.html

Or use the CDH archives:

For CDH3, use http://archive.cloudera.com/cdh/3/
For CDH4, use http://archive.cloudera.com/cdh4/

2. Locate the correct CDH version, and click the Tarball Download link.

3. Find the correct hadoop-<version> in the Component column, and click Download.

For example, http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.0.1.tar.gz

Hortonworks Data Platform (HDP)

1. Go to the Hortonworks Data Platform Release Repository:

http://s3.amazonaws.com/public-repo-1.hortonworks.com/index.html

2. Select the correct HDP version.

3. Navigate to and download the associated tar file (tar.gz). For example:

http://s3.amazonaws.com/public-repo-1.hortonworks.com/HDP-1.0.3/hadoop-1.0.1.tar.gz

Note: If your system has the wget binaries installed, you can also pull Hadoop CLI packages directly from the Splunk instance with the wget command. For example:

wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4.tar.gz

Extract the Hadoop package

To install the Hadoop CLI package, open the archive you downloaded. For example, run the command:

tar -xvzf <archive_name>.tar.gz

Download and extract the correct Hadoop CLI for each Hadoop cluster that Splunk Hadoop Connect communicates with. If you have multiple distributions and versions of Hadoop in your environment, install multiple Hadoop CLI packages on one Splunk instance.

Test the Hadoop setup

Test your Hadoop CLI installation to make sure that:

  • There is network connectivity between your Splunk instance and your Hadoop environment.
  • The Hadoop utilities are unpacked and installed correctly.
  • The CLI can properly run Java.

Test network connectivity

To test that your Hadoop CLI is set up properly and can connect to your Hadoop cluster, run the command:

$HADOOP_HOME/bin/hadoop fs -ls <namenode>:<ipc_port>/

where

namenode is the HDFS NameNode of your Hadoop cluster.
ipc_port is the inter-process communications (IPC) port that your Hadoop cluster listens on.

If Hadoop CLI returns a directory listing and does not present an error message, then your setup is correct and you have a successful connection.

Test write access to the Hadoop cluster

To test write access to your Hadoop cluster, run this command in the path where you want to export data:

$HADOOP_HOME/bin/hadoop fs -touchz <namenode>:<ipc_port>/<dir_path>/foo.txt
$HADOOP_HOME/bin/hadoop fs -rm <namenode>:<ipc_port>/<dir_path>/foo.txt

If Hadoop CLI does not return an error message, then your setup is correct.

PREVIOUS
Download and install Splunk Hadoop Connect
  NEXT
Install Kerberos client utilities

This documentation applies to the following versions of Splunk® Hadoop Connect: 1.0, 1.1, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5


Comments

Thank you, Anmar0293. We have updated the topic to correct that mistake.

Andrewb splunk, Splunker
November 2, 2018

You need to update the doc. There is no need to add the hdfs://
It was giving me an error. You could simply have << node name >:<< port >

Please refer to other people's issues:

https://answers.splunk.com/answers/237910/config-error-in-splunk-hadoop-connect.html

Anmar0293
October 29, 2018

Installing as root is often easier, but you can also install as a Splunk user if you prefer. Both work equally well, as long as you install Hadoop CLI as a user that has permissions to talk to HDFS and any paths required by the Hadoop CLI installer/package. <br /><br />Hope that helps!<br />Cheers,<br />Jen

Jworthington splunk
February 14, 2014

Do you recommend installing the hadoop cli as your splunk unix ID or as a separate ID?

SloshBurch
February 7, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters