Add or edit an HDFS provider in Splunk Web
You can set up multiple providers with multiple indexes for one provider. Have the following information at hand:
- The host name and port for the NameNode of the Hadoop cluster.
- The host name and port for the JobTracker of the Hadoop cluster.
- Installation directories of Hadoop command line libraries and Java installation.
- Path to a writable directory on the DataNode/TaskTracker *nix filesystem, the one for which the Hadoop user account has read and write permission.
- Path to a writable directory in HDFS that can be used exclusively by Splunk on this search head.
You can also add HDFS providers by editing indexes.conf.
Add a provider
1. In the top menu, select Settings > Virtual Indexes.
2. Select the Providers tab and click New Provider or the name of the provider you want to edit.
3. The Add New/Edit Provider page, give your provider a Name.
4. Select the Provider Family in the drop down list (note that this field cannot be edited).
5. Provide the following Environment Variables:
- Java Home: provide the path to your Java instance.
- Hadoop Home: Provide the path to your Hadoop client directory.
6. Provide the following Hadoop Cluster Information:
- Hadoop Version: Specify which version of Hadoop the cluster is running one of: Hadoop 1.0, Hadoop 2.0 with MRv1 or Hadoop 2.0 with Yarn.
- JobTracker: Provide the path to the Job Tracker.
- File System: Provide the path to the default file system.
7. Provide the following Settings:
- HDFS working directory: This is a path in HDFS (or whatever the default file system is) that you want to use as a working directory.
- Job queue: This is job queue where you want the MapReduce jobs for this provider to be submitted to.
8. Click Add Secure Cluster to configure security for the cluster and provide your Kerberos Server configuration.
9. The Additional Settings fields specify your provider configuration variables. Hadoop Data Roll populates these preset configuration variables for each provider you create. You can leave the preset variables in place or edit them as needed. If you want to learn more about these settings, see Provider Configuration Variables in the reference section of this manual.
Note: If you are configuring Splunk Analytics for Hadoop to work with YARN, you must add new settings. See "Required configuration variables for YARN" in this manual.
9. Click Save.
How Hadoop Data Roll works
Configure Splunk index archiving to Hadoop using the configuration files
This documentation applies to the following versions of Splunk® Enterprise: 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4