Splunk® Hadoop Connect

Deploy and Use Splunk Hadoop Connect

Download manual as PDF

Download topic as PDF

About supported file types

Hadoop Connect can index and read the following file types:

  • SequenceFile
  • gzip
  • text
  • bzip2
  • Snappy: Install relevant native libraries on the native library path.
  • LZO: Install relevant native libraries on the native library path.
  • Avro: Apply the following patch to your Hadoop cluster: https://issues.apache.org/jira/browse/HADOOP-9740.

Verify native libraries for Snappy and LZO

To use Snappy and LZO format, install the relevant native libraries on the native library path.

1. To verify that your Hadoop CLI works with the file formats you want to read/index in Hadoop Connect, run the following command in a shell terminal:

$$HADOOP_HOME/bin/hadoop fs -text hdfs://<namenode.host:port>/path/to/your/file

2. Make sure that the hadoop-env.sh file in your Hadoop client has not been modified. If it is not modified, you are ready to read/index these file formats using Hadoop Connect. If the file has been modified, go to step 3.

3. Make sure that you have correctly installed your libraries. Make sure the hadoop-env.sh file for your Hadoop client is in place.

  • If you modified hadoop-env.sh, (for example, if your Snappy native library is not on the default native library path (/usr/lib/ or $HADOOP_HOME/lib/native/), point java.library.path to where your libraries are. Then, copy those libraries to the default native library path and restore the original hadoop-env.sh.
  • If you do not have the permission to copy the libraries, or if the Hadoop CLI text command cannot read the file with the default hadoop-env.sh, copy your updated hadoop-env.sh file to the location where your Hadoop Connect cluster config files are located. For example:

$SPLUNK_HOME/etc/apps/HadoopConnect/local/clusters/<your-clusters-name>/

This way, the hadoop-env.sh along with core-site.xml and possible hdfs-site.xml in your clusters configuration directory are picked up by your Hadoop client at run time.

  • The $HADOOP_HOME/bin/hadoop script of different Hadoop distributions or versions may have different requirements for setting native libraries on their native lib path. Because different Hadoop distributions and versions look to different locations for native libraries, determine where to put your native libraries on your Hadoop client. Check the java.library.path property on your Hadoop server side by running this command:

$ps -ef |grep java.library.path

Supported output formats for export

You can export Splunk events into the following formats:

  • CSV
  • TSV
  • JSON
  • XML
  • RAW

Hadoop Connect also supports GzipCodec for file compression. To use file compression, your schedule export job Compression Level must be greater than 0. For information about setting up your export job through the user interface, see "Export to HDFS or a mounted file system" in this manual.

Last modified on 28 July, 2015
PREVIOUS
Configure Splunk Hadoop Connect
  NEXT
App dashboard

This documentation applies to the following versions of Splunk® Hadoop Connect: 1.0, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters