Configure Splunk Hadoop Connect
After you install Splunk Hadoop Connect, configure it to begin collecting data from your Hadoop cluster or mounted file system. You can configure it from within Splunk Web or with configuration files.
Splunk Hadoop Connect allows users to export data on disk. Give access to this app only to trusted administrator personnel. To update sharing settings, edit default.meta as described in the topic "Security responsibilities with custom commands" in the Splunk Enterprise Search Manual.
Configure the app from within Splunk Web
The easiest and most common way to configure Splunk Hadoop Connect is from within the application itself. If you have not already done so, install Hadoop CLI.
1. After you install Splunk Hadoop Connect, log into the Splunk platform and select App > Hadoop Connect in the Splunk system navigation bar.
2. In the Welcome to Hadoop Connect page, click Configure.
3. In the HDFS Clusters section, click Add Cluster.
4. Select whether you want to map to a remote HDFS cluster or to a mounted file system.
To map to a remote HDFS cluster, select Remote HDFS and see "Map to a remote HDFS cluster."
To map to a mounted file system, select Locally mounted Hadoop and see "Map to a mounted file system."
Map to a remote HDFS cluster
1. In the HDFS URI field, specify the Hadoop Distributed File System (HDFS) Uniform resource identifier (URI) in the format
2. In the HADOOP_HOME field, specify the location of the Hadoop command-line utilities that Splunk Hadoop Connect uses to communicate with the cluster.
3. In the JAVA_HOME field, specify the location of the Java installation that Splunk Hadoop Connect uses to communicate with the cluster.
4. In the Namenode HTTP Port field, specify the namenode (the HDFS file system centerpiece) HTTP port. The default is
5. (Optional): to communicate with a secure HDFS cluster, select the Set Security check box. Three additional fields appear.
The first, HDFS Service Principal, requires the name of a valid Kerberos service principal. You can find the value for this field in your Hadoop configuration:
core-site.xml. Enter the fully qualified service principal name.
In Kerberos Principals to Short Names Mapping, if your Hadoop configuration maps Kerberos principals to short names, you must add these mappings here. The format is identical to the mapping you use in your Hadoop configuration, e.g.,
hadoop.security.auth_to_local property. The first rule that matches a principal name is used to map that principal name to a short name. Any later rules in the list that match the same principal name are ignored.
In the Select Export Kerberos Principal drop-down, choose an existing Kerberos principal, or select "Add New Principal" to create one.
- If you selected an existing Kerberos principal, proceed to Step 10.
- If you selected "Add New Principal," two additional fields appear.
In the Name field, type in the fully qualified name of the Kerberos principal you want to create in the Splunk platform.
In the Keytab Location on server field, specify the fully qualified path of the keytab file for this Kerberos principal name, meaning the location on the server where you wrote the keytab file.
6. Check HA enabled if the cluster you are configuring is a high-availability cluster. In the HDFS site text box, provide:
- Cluster logical name
- List of comma separated NameNode IDs,
- HTTP and RPC addresses of each NameNode
- Failover proxy provider class name
<configuration> <property> <name>dfs.nameservices</name> <value>hdfs-ha</value> </property> <property> <name>dfs.ha.namenodes.hdfs-ha</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.hdfs-ha.nn1</name> <value>namenode-1.example.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.hdfs-ha.nn2</name> <value>namenode-2.example.com:8020</value> </property> <property> <name>dfs.namenode.http-address.hdfs-ha.nn1</name> <value>namenode-1.example.com:50070</value> </property> <property> <name>dfs.namenode.http-address.hdfs-ha.nn2</name> <value>namenode-2.example.com:50070</value> </property> <property> <name>dfs.client.failover.proxy.provider.hdfs-ha</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> </configuration>
This information is found in your server-side
7. Click Save. The server verifies that the information you entered is correct.
Map to a mounted file system
1. Enter a unique identifying name for the locally mounted file system.
2. Enter the path to the locally mounted file system.
3. Click Save. The server verifies that the information you entered is correct.
Configure Splunk Hadoop Connect with configuration files
If you deploy Splunk Hadoop Connect onto an instance where Splunk Web is disabled then you must set up the app with configuration files.
See "About configuration files" in the Admin Manual before you begin.
Note: You might need to speak with your HDFS administrator about specific HDFS Namenode IP addresses and HTTP and inter-process communication (IPC) ports.
1. Using a shell prompt, go to
2. Create the following files in the
||To define the HDFS clusters and mounted file systems that Splunk Hadoop Connect should connect to.|
||To define export jobs for the HDFS clusters or mounted file systems that Splunk Hadoop Connect has been configured to connect to.|
||To import HDFS data into Splunk Enterprise, define the HDFS clusters you want to import from.|
The HDFS import feature requires Splunk 5.0 or later.
3. Edit each configuration file to meet your environment's needs.
4. Save each file in the
5. Restart Splunk Enterprise for the changes to take effect.
Note: See "Configuration file reference" in this manual for details on the valid attributes and values for each of the configuration files listed.
Example configuration files
Here are example configuration files to show you how to configure Splunk Hadoop Connect.
Edit these configuration files to your environment before saving them. Do not copy and paste these files into your text editor.
For an HDFS cluster:
[hadoop-server1.example.com:8020] hadoop_home = /opt/hadoop java_home = /usr/java/latest
For a mounted file system:
[my local file system] file://mylocalfilesystem
[export1] base_path = /usr/hadoop/export partition_fields = date,host,hour search = index=os sourcetype=interfaces host=hadoop-server1.example.com starttime = 1353474000 uri = hdfs://hadoop-server1.example.com:8020
Validate the configuration
After you configure Splunk Hadoop Connect, validate it to make sure it is connecting to the HDFS cluster or local file system properly.
In the the upper-right of the HDFS Clusters panel, under Actions, select Test.
Hadoop Connect does a file listing (
ls) of the HDFS root to make sure it can connect to and communicate with the Hadoop cluster.
Install Kerberos client utilities
About supported file types
This documentation applies to the following versions of Splunk® Hadoop Connect: 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5