About supported file types
Hadoop Connect can index and read the following file types:
- Snappy: Install relevant native libraries on the native library path.
- LZO: Install relevant native libraries on the native library path.
- Avro: Apply the following patch to your Hadoop cluster: https://issues.apache.org/jira/browse/HADOOP-9740.
Verify native libraries for Snappy and LZO
To use Snappy and LZO format, install the relevant native libraries on the native library path.
1. To verify that your Hadoop CLI works with the file formats you want to read/index in Hadoop Connect, run the following command in a shell terminal:
$$HADOOP_HOME/bin/hadoop fs -text hdfs://<namenode.host:port>/path/to/your/file
2. Make sure that the
hadoop-env.sh file in your Hadoop client has not been modified. If it is not modified, you are ready to read/index these file formats using Hadoop Connect. If the file has been modified, go to step 3.
3. Make sure that you have correctly installed your libraries. Make sure the
hadoop-env.sh file for your Hadoop client is in place.
- If you modified
hadoop-env.sh, (for example, if your Snappy native library is not on the default native library path (
/usr/lib/ or $HADOOP_HOME/lib/native/), point
java.library.pathto where your libraries are. Then, copy those libraries to the default native library path and restore the original
- If you do not have the permission to copy the libraries, or if the Hadoop CLI text command cannot read the file with the default
hadoop-env.sh, copy your updated
hadoop-env.shfile to the location where your Hadoop Connect cluster config files are located. For example:
This way, the hadoop-env.sh along with core-site.xml and possible
hdfs-site.xml in your clusters configuration directory are picked up by your Hadoop client at run time.
$HADOOP_HOME/bin/hadoopscript of different Hadoop distributions or versions may have different requirements for setting native libraries on their native lib path. Because different Hadoop distributions and versions look to different locations for native libraries, determine where to put your native libraries on your Hadoop client. Check the
java.library.pathproperty on your Hadoop server side by running this command:
$ps -ef |grep java.library.path
Supported output formats for export
You can export Splunk events into the following formats:
Hadoop Connect also supports GzipCodec for file compression. To use file compression, your schedule export job Compression Level must be greater than 0. For information about setting up your export job through the user interface, see "Export to HDFS or a mounted file system" in this manual.
Configure Splunk Hadoop Connect
This documentation applies to the following versions of Splunk® Hadoop Connect: 1.0, 1.2, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5