Splunk® Hadoop Connect

Deploy and Use Splunk Hadoop Connect

Download manual as PDF

Download topic as PDF

Search command reference

Splunk Hadoop Connect includes the following new search commands.

hdfs

Synopsis

Execute a Hadoop Distributed File System (HDFS) directive and return search results.

Syntax

hdfs <directive> [<hdfs-location>] [hdfs-location...] [<directive-options>]

Directives

read
Syntax: read [hdfs-location] [hdfs-location...] [delim | fields]
Description: Reads the contents of the given files. You can supply more than one hdfs-location.
Note: The read directive can only read files, not directories. Also, hdfs-location cannot contain wild cards (*).
ls
Syntax: ls [hdfs-location] [hdfs-location...]
Description: Lists the contents of the given HDFS locations and brings each entry back as a search result.
lsr
Syntax: lsr [hdfs-location] [hdfs-location...]
Description: Recursively lists the contents of the given HDFS locations and brings each entry back as a search result.

Directive options, available for the read directive only:

delim
Syntax: delim=<string>
Description: Delimiter character to split the read lines into fields.
fields
Syntax: fields=<string list>
Description: Comma delimited list of field names to assign to the segments created by the delimiter splitting.

Arguments

hdfs-location
Syntax: hdfs-location=<string>
Description: The location of your HDFS cluster.

Description

Allows you to explore an HDFS cluster. Execute an HDFS directive and return the search results into the Splunk platform.

Examples

Example 1: Read MapReduce tab delimited result output containing the following fields: city, lat, long, and count, then sort them in descending order and display only the top 10.

| hdfs read "hdfs://mynamenode:9000/path/to/mr/results" delim="\t" fields="city,lat,long,count" | sort 10 -count

Example 2: List all files and directories in the HDFS path hdfs://mynamenode:9000/path/to/mr/results".

| hdfs ls "hdfs://mynamenode:9000/some/path" "hdfs://mynamenode:9000/some/other/path"

Example 3: Recursively list all files and directories in the given path.

| hdfs lsr "hdfs://mynamenode:9000/some/path" "hdfs://mynamenode:9000/some/other/path"

runexport

Synopsis

Executes a Hadoop Distributed File System (HDFS) export. Triggers and manages export jobs and cursors and failure conditions.

Note: This command is for internal use only.

Syntax

runexport name=<string> [forcerun=<bool>] [roll_size=<number>] [maxspan=<number>] [minspan=<number>] [starttime=<epoch-time>] [endtime=<epoch-time>] [parallel_searches=<int>|max] [kerberos_principal=<string>] [format=raw|json|xml|csv|tsv] [fields=<string list>] [parallel_searches=<number>] [kerberos_principal=<string>]

Arguments

forcerun
Syntax: forcerun=<bool>
Description: A toggle that determines whether or not to bypass scheduler checking. The default is 0, meaning not set.
roll_size
Syntax: roll_size=<number>
Description: Minimum file size, in megabytes, at which point the Splunk platform no longer writes to the file and it becomes a candidate for HDFS transfer. Default is 63.
maxspan
Syntax: maxspan=<number>
Description: Maximum number of index time seconds to run an export job for. Defaults to 864000, which is 10 days.
minspan
Syntax: maxspan=<number>
Description: Minimum number of index time seconds to run an export job for. Defaults to 1800, which is 30 minutes.
starttime
Syntax: starttime=<epoch-time>
Description: If there is no cursor available, the minimum index time that the app should use, in seconds since 0:00:00 UTC on January 1, 1970 (unix epoch time). Defaults to the value specified in endtime minus the value specified in minspan.
endtime
Syntax: starttime=<epoch-time>
Description: The maximum index time for which to export data, in seconds since 0:00:00 UTC on January 1, 1970 (unix epoch time). The default is the current time, in unix epoch time.
format
Syntax: format=<raw|json|xml|csv|tsv>
Description: The data output format that the app should use. Supported values are raw, csv, tsv, json and xml.
fields
Syntax: fields=<string list>
Description: A list of Splunk event fields exported to export data. The Splunk platform ignores invalid fields.
parallel_searches
Syntax: parallel_searches=[<number>|max]
Description: The number of parallel searches to spawn. Each search targets a subset of indexers. This argument must be either a number greater than 0 or the word 'max'. When set to max, it spawns a number of parallel searches equal to the number of active search peers on the Splunk instance.
kerberos_principal
Syntax: kerberos_principal=<string>
Description: The Kerberos principal name required to access an HDFS destination that uses Kerberos authentication.

Description

This command executes, triggers, and manages an HDFS export job. It also handles cursor conditions and failures, and renames files to their final destination.

When this command runs, it generates a search query that searches events within a certain index time known as the index time disjunction.

The lower bound of the index time disjunction is the greater of either the last successful export cursor time stamp or the value specified in the starttime argument. The upper bound of the disjunction is the lesser of either the lower bound plus the values specified for the maxspan argument, or the value specified in the endtime argument.

Note: Only a scheduler triggers the runexport command.

Examples

Example 1: Run an export named MyData with a roll size of 128 megabytes

| runexport name=MyData forcerun=1 roll_size=128

dump

dump is an internal search command new to Splunk Enterprise and Splunk Hadoop Connect for 6.0.

See dump search reference in the Splunk Search Reference manual.

Note: Internal commands refer to search commands that are experimental. They may be removed or updated and reimplemented differently in future versions. They are not supported commands.

PREVIOUS
Use the Troubleshoot menu
  NEXT
Log file reference

This documentation applies to the following versions of Splunk® Hadoop Connect: 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters