Splunk® Hadoop Connect

Deploy and Use Splunk Hadoop Connect

Download manual as PDF

Download topic as PDF

Configure Splunk Hadoop Connect

After you install Splunk Hadoop Connect, configure it to begin collecting data from your Hadoop cluster or mounted file system. You can configure it from within Splunk Web or with configuration files.

Splunk Hadoop Connect allows users to export data on disk. Give access to this app only to trusted administrator personnel. To update sharing settings, edit default.meta as described in the topic "Security responsibilities with custom commands" in the Splunk Enterprise Search Manual.

Configure the app from within Splunk Web

The easiest and most common way to configure Splunk Hadoop Connect is from within the application itself. If you have not already done so, install Hadoop CLI.

1. After you install Splunk Hadoop Connect, log into the Splunk platform and select App > Hadoop Connect in the Splunk system navigation bar.

2. In the Welcome to Hadoop Connect page, click Configure.

3. In the HDFS Clusters section, click Add Cluster.

4. Select whether you want to map to a remote HDFS cluster or to a mounted file system.

To map to a remote HDFS cluster, select Remote HDFS and see "Map to a remote HDFS cluster."

To map to a mounted file system, select Locally mounted Hadoop and see "Map to a mounted file system."

Map to a remote HDFS cluster

1. In the HDFS URI field, specify the Hadoop Distributed File System (HDFS) Uniform resource identifier (URI) in the format namenode:port, hdfs.example.com:8020

2. In the HADOOP_HOME field, specify the location of the Hadoop command-line utilities that Splunk Hadoop Connect uses to communicate with the cluster.

3. In the JAVA_HOME field, specify the location of the Java installation that Splunk Hadoop Connect uses to communicate with the cluster.

4. In the Namenode HTTP Port field, specify the namenode (the HDFS file system centerpiece) HTTP port. The default is 50070.

5. (Optional): to communicate with a secure HDFS cluster, select the Set Security check box. Three additional fields appear.

The first, HDFS Service Principal, requires the name of a valid Kerberos service principal. You can find the value for this field in your Hadoop configuration: dfs.namenode.kerberos.principal in core-site.xml. Enter the fully qualified service principal name.

In Kerberos Principals to Short Names Mapping, if your Hadoop configuration maps Kerberos principals to short names, you must add these mappings here. The format is identical to the mapping you use in your Hadoop configuration, e.g., hadoop.security.auth_to_local property. The first rule that matches a principal name is used to map that principal name to a short name. Any later rules in the list that match the same principal name are ignored.

In the Select Export Kerberos Principal drop-down, choose an existing Kerberos principal, or select "Add New Principal" to create one.

  • If you selected an existing Kerberos principal, proceed to Step 10.
  • If you selected "Add New Principal," two additional fields appear.

In the Name field, type in the fully qualified name of the Kerberos principal you want to create in the Splunk platform.

In the Keytab Location on server field, specify the fully qualified path of the keytab file for this Kerberos principal name, meaning the location on the server where you wrote the keytab file.

6. Check HA enabled if the cluster you are configuring is a high-availability cluster. In the HDFS site text box, provide:

  • Cluster logical name
  • List of comma separated NameNode IDs,
  • HTTP and RPC addresses of each NameNode
  • Failover proxy provider class name

For example:

<configuration>
   <property>
     <name>dfs.nameservices</name>
     <value>hdfs-ha</value>
   </property>
   <property>
     <name>dfs.ha.namenodes.hdfs-ha</name>
     <value>nn1,nn2</value>
   </property>
   <property>
     <name>dfs.namenode.rpc-address.hdfs-ha.nn1</name>
     <value>namenode-1.example.com:8020</value>
   </property>
   <property>
     <name>dfs.namenode.rpc-address.hdfs-ha.nn2</name>
     <value>namenode-2.example.com:8020</value>
   </property>
   <property>
     <name>dfs.namenode.http-address.hdfs-ha.nn1</name>
     <value>namenode-1.example.com:50070</value>
   </property>
   <property>
     <name>dfs.namenode.http-address.hdfs-ha.nn2</name>
     <value>namenode-2.example.com:50070</value>
   </property>
   <property>
     <name>dfs.client.failover.proxy.provider.hdfs-ha</name>
     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>
 </configuration>

This information is found in your server-side hdfs-site.xml.

7. Click Save. The server verifies that the information you entered is correct.

Map to a mounted file system

1. Enter a unique identifying name for the locally mounted file system.

2. Enter the path to the locally mounted file system.

3. Click Save. The server verifies that the information you entered is correct.

Configure Splunk Hadoop Connect with configuration files

If you deploy Splunk Hadoop Connect onto an instance where Splunk Web is disabled then you must set up the app with configuration files.

See "About configuration files" in the Admin Manual before you begin.

Note: You might need to speak with your HDFS administrator about specific HDFS Namenode IP addresses and HTTP and inter-process communication (IPC) ports.

1. Using a shell prompt, go to $SPLUNK_HOME/etc/apps/HadoopConnect.

2. Create the following files in the local directory.

File name Description
clusters.conf To define the HDFS clusters and mounted file systems that Splunk Hadoop Connect should connect to.
export.conf To define export jobs for the HDFS clusters or mounted file systems that Splunk Hadoop Connect has been configured to connect to.
inputs.conf To import HDFS data into Splunk Enterprise, define the HDFS clusters you want to import from.

The HDFS import feature requires Splunk 5.0 or later.

3. Edit each configuration file to meet your environment's needs.

4. Save each file in the local directory.

5. Restart Splunk Enterprise for the changes to take effect.

Note: See "Configuration file reference" in this manual for details on the valid attributes and values for each of the configuration files listed.

Example configuration files

Here are example configuration files to show you how to configure Splunk Hadoop Connect.

Edit these configuration files to your environment before saving them. Do not copy and paste these files into your text editor.

clusters.conf

For an HDFS cluster:

[hadoop-server1.example.com:8020]
hadoop_home = /opt/hadoop
java_home = /usr/java/latest

For a mounted file system:

[my local file system]
file://mylocalfilesystem

export.conf

[export1]
base_path = /usr/hadoop/export
partition_fields = date,host,hour
search = index=os sourcetype=interfaces host=hadoop-server1.example.com
starttime = 1353474000
uri = hdfs://hadoop-server1.example.com:8020

inputs.conf

[hdfs://hadoop-server1.example.com:8020]
sourcetype=hadoop

Validate the configuration

After you configure Splunk Hadoop Connect, validate it to make sure it is connecting to the HDFS cluster or local file system properly.

In the the upper-right of the HDFS Clusters panel, under Actions, select Test.

Hadoop Connect does a file listing (ls) of the HDFS root to make sure it can connect to and communicate with the Hadoop cluster.

PREVIOUS
Install Kerberos client utilities
  NEXT
About supported file types

This documentation applies to the following versions of Splunk® Hadoop Connect: 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5


Comments

Hi Rishimaths,<br /><br />You might need to check out the documentation for your particular version of Cloudera to find this information. You can find the Cloudera documentation here: http://www.cloudera.com/content/support/en/documentation.html.<br /><br />Hope that helps!<br />Cheers,<br />Jen

Jworthington splunk
June 20, 2014

Splunk connection with Cloudera for Remote HDFS <br /> <br />How to find HDFS URI * in cloudera?<br /><br />How to find HADOOP_HOME in cloudera?<br /><br />How to find Java_HOME in cloudera?<br /><br />Name node HTTP Port *= ?<br /><br />Set Security = ?<br />HA enabled = ?

Rishimaths
June 19, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters