Splunk® Hadoop Connect

Deploy and Use Splunk Hadoop Connect

Download manual as PDF

Download topic as PDF

About supported file types

Hadoop Connect can index and read the following file types:

  • SequenceFile
  • gzip
  • text
  • bzip2
  • Snappy: Install relevant native libraries on the native library path.
  • LZO: Install relevant native libraries on the native library path.
  • Avro: Apply the following patch to your Hadoop cluster: https://issues.apache.org/jira/browse/HADOOP-9740.

Verify native libraries for Snappy and LZO

To use Snappy and LZO format, install the relevant native libraries on the native library path.

1. To verify that your Hadoop CLI works with the file formats you want to read/index in Hadoop Connect, run the following command in a shell terminal:

$$HADOOP_HOME/bin/hadoop fs -text hdfs://<namenode.host:port>/path/to/your/file

2. Make sure that the hadoop-env.sh file in your Hadoop client has not been modified. If it is not modified, you are ready to read/index these file formats using Hadoop Connect. If the file has been modified, go to step 3.

3. Make sure that you have correctly installed your libraries. Make sure the hadoop-env.sh file for your Hadoop client is in place.

  • If you modified hadoop-env.sh, (for example, if your Snappy native library is not on the default native library path (/usr/lib/ or $HADOOP_HOME/lib/native/), point java.library.path to where your libraries are. Then, copy those libraries to the default native library path and restore the original hadoop-env.sh.
  • If you do not have the permission to copy the libraries, or if the Hadoop CLI text command cannot read the file with the default hadoop-env.sh, copy your updated hadoop-env.sh file to the location where your Hadoop Connect cluster config files are located. For example:

$SPLUNK_HOME/etc/apps/HadoopConnect/local/clusters/<your-clusters-name>/

This way, the hadoop-env.sh along with core-site.xml and possible hdfs-site.xml in your clusters configuration directory are picked up by your Hadoop client at run time.

  • The $HADOOP_HOME/bin/hadoop script of different Hadoop distributions or versions may have different requirements for setting native libraries on their native lib path. Because different Hadoop distributions and versions look to different locations for native libraries, determine where to put your native libraries on your Hadoop client. Check the java.library.path property on your Hadoop server side by running this command:

$ps -ef |grep java.library.path

Supported output formats for export

You can export Splunk events into the following formats:

  • CSV
  • TSV
  • JSON
  • XML
  • RAW

Hadoop Connect also supports GzipCodec for file compression. To use file compression, your schedule export job Compression Level must be greater than 0. For information about setting up your export job through the user interface, see "Export to HDFS or a mounted file system" in this manual.

PREVIOUS
Configure Splunk Hadoop Connect
  NEXT
App dashboard

This documentation applies to the following versions of Splunk® Hadoop Connect: 1.0, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5


Comments

Hi SloshBurch,<br /><br />Thanks for your helpful comments, we've reworked the topic a bit to try and clarify the supported files for output. If you have any more thoughts or questions, please let us know!<br /><br />Cheers,<br />jen

Jworthington splunk
May 15, 2014

I interpreted this page to imply that we can control the output format as per the file types listed on this article. Case 167822 informed me that this is for data indexing, not exporting. Can this page be updated so as to make it more clear what data these formats related to?

SloshBurch
May 14, 2014

I may be just missing it, but I don't see where this article states how to toggle the compressed file type.

SloshBurch
February 21, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters