Splunk® Hadoop Connect

Deploy and Use Splunk Hadoop Connect

Download manual as PDF

Download topic as PDF

Search command reference

Splunk Hadoop Connect includes the following new search commands.



Execute a Hadoop Distributed File System (HDFS) directive and return search results.


hdfs <directive> [<hdfs-location>] [hdfs-location...] [<directive-options>]


Syntax: read [hdfs-location] [hdfs-location...] [delim | fields]
Description: Reads the contents of the given files. You can supply more than one hdfs-location.
Note: The read directive can only read files, not directories. Also, hdfs-location cannot contain wild cards (*).
Syntax: ls [hdfs-location] [hdfs-location...]
Description: Lists the contents of the given HDFS locations and brings each entry back as a search result.
Syntax: lsr [hdfs-location] [hdfs-location...]
Description: Recursively lists the contents of the given HDFS locations and brings each entry back as a search result.

Directive options, available for the read directive only:

Syntax: delim=<string>
Description: Delimiter character to split the read lines into fields.
Syntax: fields=<string list>
Description: Comma delimited list of field names to assign to the segments created by the delimiter splitting.


Syntax: hdfs-location=<string>
Description: The location of your HDFS cluster.


Allows you to explore an HDFS cluster. Execute an HDFS directive and return the search results into the Splunk platform.


Example 1: Read MapReduce tab delimited result output containing the following fields: city, lat, long, and count, then sort them in descending order and display only the top 10.

| hdfs read "hdfs://mynamenode:9000/path/to/mr/results" delim="\t" fields="city,lat,long,count" | sort 10 -count

Example 2: List all files and directories in the HDFS path hdfs://mynamenode:9000/path/to/mr/results".

| hdfs ls "hdfs://mynamenode:9000/some/path" "hdfs://mynamenode:9000/some/other/path"

Example 3: Recursively list all files and directories in the given path.

| hdfs lsr "hdfs://mynamenode:9000/some/path" "hdfs://mynamenode:9000/some/other/path"



Executes a Hadoop Distributed File System (HDFS) export. Triggers and manages export jobs and cursors and failure conditions.

Note: This command is for internal use only.


runexport name=<string> [forcerun=<bool>] [roll_size=<number>] [maxspan=<number>] [minspan=<number>] [starttime=<epoch-time>] [endtime=<epoch-time>] [parallel_searches=<int>|max] [kerberos_principal=<string>] [format=raw|json|xml|csv|tsv] [fields=<string list>] [parallel_searches=<number>] [kerberos_principal=<string>]


Syntax: forcerun=<bool>
Description: A toggle that determines whether or not to bypass scheduler checking. The default is 0, meaning not set.
Syntax: roll_size=<number>
Description: Minimum file size, in megabytes, at which point the Splunk platform no longer writes to the file and it becomes a candidate for HDFS transfer. Default is 63.
Syntax: maxspan=<number>
Description: Maximum number of index time seconds to run an export job for. Defaults to 864000, which is 10 days.
Syntax: maxspan=<number>
Description: Minimum number of index time seconds to run an export job for. Defaults to 1800, which is 30 minutes.
Syntax: starttime=<epoch-time>
Description: If there is no cursor available, the minimum index time that the app should use, in seconds since 0:00:00 UTC on January 1, 1970 (unix epoch time). Defaults to the value specified in endtime minus the value specified in minspan.
Syntax: starttime=<epoch-time>
Description: The maximum index time for which to export data, in seconds since 0:00:00 UTC on January 1, 1970 (unix epoch time). The default is the current time, in unix epoch time.
Syntax: format=<raw|json|xml|csv|tsv>
Description: The data output format that the app should use. Supported values are raw, csv, tsv, json and xml.
Syntax: fields=<string list>
Description: A list of Splunk event fields exported to export data. The Splunk platform ignores invalid fields.
Syntax: parallel_searches=[<number>|max]
Description: The number of parallel searches to spawn. Each search targets a subset of indexers. This argument must be either a number greater than 0 or the word 'max'. When set to max, it spawns a number of parallel searches equal to the number of active search peers on the Splunk instance.
Syntax: kerberos_principal=<string>
Description: The Kerberos principal name required to access an HDFS destination that uses Kerberos authentication.


This command executes, triggers, and manages an HDFS export job. It also handles cursor conditions and failures, and renames files to their final destination.

When this command runs, it generates a search query that searches events within a certain index time known as the index time disjunction.

The lower bound of the index time disjunction is the greater of either the last successful export cursor time stamp or the value specified in the starttime argument. The upper bound of the disjunction is the lesser of either the lower bound plus the values specified for the maxspan argument, or the value specified in the endtime argument.

Note: Only a scheduler triggers the runexport command.


Example 1: Run an export named MyData with a roll size of 128 megabytes

| runexport name=MyData forcerun=1 roll_size=128


dump is an internal search command new to Splunk Enterprise and Splunk Hadoop Connect for 6.0.

See dump search reference in the Splunk Search Reference manual.

Note: Internal commands refer to search commands that are experimental. They may be removed or updated and reimplemented differently in future versions. They are not supported commands.

Last modified on 29 July, 2015
Use the Troubleshoot menu
Log file reference

This documentation applies to the following versions of Splunk® Hadoop Connect: 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters