Why source types matter
The source type is one of the default fields that Splunk software assigns to all incoming data. It tells Splunk software what kind of data you have, so that it can format the data intelligently during indexing. Source types also let you categorize your data for easier searching.
Source types determine how incoming data is formatted
Because the source type controls how Splunk software formats incoming data, it is important that you assign the correct source type to your data. That way, the indexed version of the data (the event data) looks the way you want, with appropriate timestamps and event breaks. This facilitates easier searching of the data later.
Splunk software comes with a large number of predefined source types. When consuming data, Splunk software will usually select the correct source type automatically. If your data is specialized, you might need to manually select a different predefined source type. If your data is unusual, you might need to create a new source type with customized event processing settings. And if your data source contains heterogeneous data, you might need to assign the source type on a per-event (rather than a per-source) basis.
Like any other field, you can also use the source type field to search event data, once the data has been indexed. You will use it a lot in your searches since the source type is a key way to categorize your data.
Typical source types
Any common data input format can be a source type. Most source types are log formats. For example, some common source types that Splunk software automatically recognizes include:
- access_combined, for NCSA combined format HTTP Web server logs.
- apache_error, for standard Apache Web server error logs.
- cisco_syslog, for the standard syslog produced by Cisco network devices (including PIX firewalls, routers, and ACS), usually via remote syslog to a central log host.
- websphere_core, a core file export from WebSphere.
For a complete list of predefined source types, see List of pretrained source types in this manual.
Configure source types
There are two basic types of configuration you can do with source types:
- Assign source types explicitly to your incoming data.
- Create new source types, either from scratch or by modifying an existing source type.
Assign source types
In most cases, Splunk software determines the best source type for your data and automatically assigns it to incoming events. In some cases, however, you might need to explicitly assign a source type to your data. You usually do this when defining the data input. For details on how to improve source type assignment, see:
- Override automatic source type assignment
- Override source types on a per-event basis
- Configure rule-based source type recognition
- Create source types
- Rename source types
Later in this topic, there is a section that explains how Splunk software assigns source types.
Create new source types
If none of the existing source types fits the needs of your data, create a new one.
Splunk Web lets you adjust source type settings to fit your data. In essence, it is a visual source type editor. See The Set Sourcetype page.
If you have Splunk Enterprise, you can also create a new source type by directly editing props.conf and adding a source type stanza. See Create source types. If you have Splunk Cloud, use Splunk Web to define source types.
Preview data to test and modify source types
Splunk Web lets you review the effects of applying a source type to an input. It lets you preview the resulting events without actually committing them to an index. You can also edit timestamp and event breaking settings interactively and then save the modifications as a new source type. For information on how data preview functions as a source type editor, see The Set Sourcetype page.
Search on source types
sourcetype is the name of the source type search field. You can use the
sourcetype field to find similar types of data from any source type. For example, you could search
sourcetype=weblogic_stdout to find all of your WebLogic server events, even when WebLogic is logging from more than one domain (or "host," in Splunk terms).
How Splunk software assigns source types
Splunk software employs a variety of methods to assign source types to event data at index time. As it processes event data, Splunk software steps through these methods in a defined order of precedence. It starts with hardcoded source type configurations in
props.conf, moves on to rule-based source type association, and then works through methods like automatic source type recognition and automatic source type learning. This range of methods enables you to configure how Splunk software applies source type values to specific kinds of events, while assigning source type values to other events automatically.
The following list shows how Splunk software goes about determining the source type for a data input. Splunk software starts with the first method and then descends through the others as necessary, until it can determine the source type. The list also provides an overview on how you configure source type assignment for each level.
Explicit source type specification based on the data input
If Splunk software finds an explicit source type for the data input, it stops here.
You can also assign a source type when defining an input in Splunk Web. For information on doing this for file inputs, see Monitor files and directories with Splunk Web in this manual. The process is similar for network or other types of inputs.
For more information, see Specify source type for an input.
Explicit source type specification based on the data source
If Splunk software finds an explicit source type for the particular source, it stops here.
You configure this in props.conf, using this syntax:
For more information, see Specify source type for a source.
Rule-based source type recognition
Splunk software looks next for any rules you've created for source types.
You can create source type classification rules in
[rule::<rule_name>] sourcetype=<sourcetype> MORE_THAN_[0-100] = <regex> LESS_THAN_[0-100] = <regex>
For information about setting up source type recognition rules, see Configure rule-based source type recognition.
Automatic source type matching
Splunk software next attempts to use automatic source type recognition to match similar-looking files and assign a source type.
Splunk software calculates signatures for patterns in the first few thousand lines of any file or network input stream. These signatures identify things like repeating word patterns, punctuation patterns, line length, and so on. When Splunk software calculates a signature, it compares it to its set of signatures for known, "pretrained" source types. If it identifies a match, it assigns that source type to the data.
See List of pretrained source types in this manual for a list of the source types that Splunk software can recognize out of the box.
Delayed rule-based source type association
If Splunk software hasn't identified a source type by now, it looks for any delayed rules.
This works like rule-based associations. You create a
delayedrule:: stanza in
props.conf. This is a useful "catch-all" for source types, in case Splunk software missed any with intelligent matching (see above).
A good use of delayed rule associations is for generic versions of very specific source types that were defined earlier with
rule:: in step 3, above. For example, you could use
rule:: to catch event data with specific syslog source types, such as "sendmail syslog" or "cisco syslog" and then have
delayedrule:: apply the generic "syslog" source type to the remaining syslog event data.
Here is the syntax:
[delayedrule::$RULE_NAME] sourcetype=$SOURCETYPE MORE_THAN_[0-100] = $REGEX LESS_THAN_[0-100] = $REGEX
For more information about setting up or removing delayed rules for source type recognition, see Configure rule-based source type recognition.
Automatic source type learning
If Splunk software is unable to assign a source type for the event using the preceding methods, it creates a new source type for the event signature (see step 4, above). Splunk software stores learned pattern information in sourcetypes.conf.
Change host values after indexing
Override automatic source type assignment
This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 7.0.0, 7.0.1