Admin Manual

 


About the Splunk Admin Manual
How Splunk Works

How source types work

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

How source types work

A source type is any common format of data. sourcetype is one of Splunk's default fields (it's indexed and stored with every event). It provides an easy way to find similar types of data from any input. For example, you might search sourcetype=weblogic_stdout even though weblogic might be logging from two different domains.


Source vs source type

Source is also one of Splunk's default fields, indexed and stored with every event as source. It refers to any file, stream, or other input sending data to Splunk. For data coming from files and directories, the value of source is the full path, such as /archive/server1/var/log/messages.0 or /var/log/. The value of source for network-based data sources is the protocol and port, such as UDP:514.

Different sources can have the same source type. For example, you may monitor source=/var/log/messages and receive direct syslog input from udp:514. Find both by searching for sourcetype=linux_syslog.


How Splunk can set sourcetype field values

Automatic source type classification

During indexing, Splunk classifies source types automatically by calculating signatures for patterns in the first few thousand lines of any file or stream of network input. These signatures pick up things like repeating patterns of words, punctuation patterns, line length, etc. Once Splunk has calculated a signature, it compares the signature to previously seen signatures - if it's a radically new pattern, Splunk creates a new source type. Learned pattern information is stored in sourcetypes.conf.

To configure your own automatic source type recognition, use Splunk's rule-based source type feature. Rule-based source types are automatically assigned based on regular expressions you specify in props.conf. Learn more about how to configure rule-based source types.

Rename source types

To assign new source type names, edit sourcetypes.conf. However, this only changes the name of future data inputs. To change the source type for events that have already been indexed, create an alias for a source type. Aliasing source types is a cosmetic change that allows users to search for source type values that make sense.

Note: If you set indexing properties for a source type in props.conf, you must use the actual stored source type value from sourcetypes.conf.

Train the source type auto-classifier

To customize source type names, use Splunk's auto-classifier with a set of representative example files. If you train it with a wide enough range of files that you'd like share the same source type, it learns more good rules. Then, Splunk's recognition improves for new indexed files of that source type. Pre-training is how Splunk ships with the ability to assign sourcetype=syslog to most syslog files.

Bypass Splunk's auto-classification, skip the training step and simply hardcode a sourcetype for each data input. However, training may still be more effective if you plan to have Splunk index entire directories of mixed sourcetypes (such as /var/log). Learn how to train Splunk to recognize source types.

If Splunk fails to recognize a common format, or applies an incorrect source type value, you should report the problem to Splunk support and send us a sample file.

You can also anonymize your file using Splunk's built in anonymizer too.

Hard-coded source type assignment

Bypass automatic source type classification entirely and set a source type when you configure a data input (see the topic on setting a source type for an input). However, this method is not very granular -- all data from the same host or source is assigned the same source type name.

If you need to give different sources with in a single directory input different names, try setting source type for a source.


How Splunk applies source type values (precedence)

You can either configure how Splunk applies source type values to events, or you can let Splunk automatically apply them. The following list shows the methods and in what order that Splunk uses to apply source type values to events:

1. Explicit specification of source type per input stanza in inputs.conf:

[monitor://$PATH]
sourcetype=$SOURCETYPE

2. Explicit specification of source type per source by creating a stanza in props.conf:

[$SOURCE] 
sourcetype=$SOURCETYPE

3. Rule-based association of source types:

Allows you to match sources to source types using classification rules specified in rule:: stanzas in props.conf.

4. Intelligent matching:

Matches similar-looking files and creates a source type.

5. Delayed rules:

Works like rule-based associations, except you create a [delayedrule:: ] stanza in props.conf. This is a useful "catch-all" for source types, in case Splunk missed any.

6. Automatic source type learning:

Splunk creates new source types based on sources that don't already have source types associated with them.


Configuration files for source types

Set source type for a source in inputs.conf. Configure custom indexing properties and rule-based associations of source types via props.conf. Before manually modifying any configuration file, read about configuration files.

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.