Import from HDFS
If you are running Splunk Enterprise 5.0 or later, you can import files from Hadoop Distributed File System (HDFS) into Splunk Enterprise for indexing.
You can import any files or directories that reside in the Hadoop clusters that you configured for the Splunk platform. The Splunk platform monitors directory information you import, and if the Splunk platform detects directory changes, it imports that information into the indexers. See "Configure Splunk Hadoop Connect".
Note: You can add mounted file systems as input using the input configuration features provided by the Splunk platform.
Once Splunk Enterprise indexes an imported HDFS file, Splunk Enterprise does not monitor the file for changes.
1. In the dashboard, click Manage HDFS Inputs.
2. In the Data Inputs page, click New.
3. Complete the following fields:
- Resource name: Enter a fully qualified path (without the leading hdfs://) to the data that you want to index. For example
- Whitelist regex: Enter a regular expression that matches files that you want to index.
- Blacklist regex: Enter a regular expression that matches files that you do not want to index.
4. Set the source type for the imported data.
- Automatic: Classifies and assigns the source type to imported data. Unknown source types get placeholder names.
- Manual: Provide the source type manually in the field provided.
- From list: Choose from the list of source types.
5. (Optional) Click More settings to change the host or index values:
- Host field value: The host value that is given to your imported data. The default value is the Splunk host where the app is running.
- Index: Select the index where you want to store your imported data. Unless you set this manually, everything is stored in the default index.
6. Click Save.
Explore HDFS or a mounted file system
Use search commands in Hadoop Connect
This documentation applies to the following versions of Splunk® Hadoop Connect: 1.1, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5