Splunk® Enterprise

Splunk Analytics for Hadoop

Splunk Enterprise version 9.0 will no longer be supported as of June 14, 2024. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.
This documentation does not apply to the most recent version of Splunk® Enterprise. For documentation on the most recent version, go to the latest release.

Add a sourcetype

Splunk Analytics for Hadoop reaches End of Life on January 31, 2025.

AFter you set up your providers and indexes, you can configure Splunk Analytics for Hadoop to search your virtual indexes by sourcetype.

Though most source types are log formats, any common data input format can be a source type. If your data is unusual, you might need to create a source type with customized event processing settings. And if your data source contains heterogeneous data, you might need to assign the source type on a per-event (rather than a per-source) basis.

See Why sourcetypes matter in the Splunk Enterprise documentation to learn more about why you might want to use sourcetyping on your HDFS data.

To add a sourcetype to an HDFS data source, you add a stanza to $SPLUNK_HOME/etc/system/local/props.conf. When defining sourcetypes for HDFS data, keep in mind that searches of HDFS data occur at search-time, not index time and that Hunk only reads the latest timestamps and not original HDFS timestamps. As a result, timestamp recognition may not always works as expected.

In the example below, we add two sourcetypes. A new sourcetype access_combined represents data from the access_combined log files. mysqld will let you search data from the specified mysqld.log file(s):

[source::.../access_combined.log]
sourcetype=access_combined
priority=100

[source::.../mysqld.log]
sourcetype=mysqld
priority=100

(You do not need to restart)

For information about searching, including searching by sourcetypes, see Use fields to search in the Splunk Enterprise Search Tutorial.

Note the following when adding a sourcetype:

  • INDEXED_TIME extractions do not work with Splunk Analytics for Hadoop.
  • While search time extractions should work with Splunk Analytics for Hadoop, it's easier to use the SimpleCSVRecordReader to do what you're looking for (if the file has a header) by adding it to the default list:

#append the SimpleCSVRecordReader to the default list:
vix.splunk.search.recordreader = ...,com.splunk.mr.input.SimpleCSVRecordReader 
vix.splunk.search.recordreader.csv.regex = <a regex to match csv files>
vix.splunk.search.recordreader.csv.dialect = tsv
Last modified on 30 October, 2023
Set up a provider and virtual index in the configuration file   Set up a virtual index in the configuration file

This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.1.7, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.2.4, 9.3.0, 9.3.1, 9.3.2


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters