About source types
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
About source types
Any common data input format can be a source type. Most source types are log formats. For example, the list of common source types that Splunk automatically recognizes includes:
- access_combined, for NCSA combined format HTTP Web server logs.
- apache_error, for standard Apache Web server error logs.
- cisco_syslog, for the standard syslog produced by Cisco network devices including PIX firewalls, routers, ACS, etc., usually via remote syslog to a central log host.
- websphere_core, which is a core file export from WebSphere.
Note: For a longer list of source types that Splunk automatically recognizes, see "Pretrained sourcetypes" in this manual.
sourcetype is the name of the source type field. Splunk extracts the sourcetype field by default, meaning that it extracts a sourcetype field for each event that it indexes. when it indexes data. You can use the sourcetype field to find similar types of data from any source type. For example, you could search sourcetype=weblogic_stdout to find all of your WebLogic server events even when WebLogic is logging from more than one domain.
Source vs source type
Source is another default field that Splunk identifies for each event that it indexes. The source is the name of the file, stream, or other input from which a particular event originates. For data monitored from files and directories, the value of source is the full path, such as /archive/server1/var/log/messages.0 or /var/log/. The value of source for network-based data sources is the protocol and port, such as UDP:514.
Events with the same source type can come from different sources. For example, say you're monitoring source=/var/log/messages and receiving direct syslog input from udp:514. If you search sourcetype=linux_syslog, Splunk will return events from both of those sources.
Methods Splunk uses for source type assignation and their precedence
Splunk employs a variety of methods to assign source types to event data at index time. As it processes event data, Splunk steps through these methods in a defined order of precedence. It starts with hardcoded source type configurations in inputs.conf and props.conf, moves on to rule-based source type association, and then works through methods like automatic source type recognition and automatic source type learning. This range of methods enables you to configure how Splunk applies source type values to specific kinds of events, while letting Splunk assign source type values to the remaining events automatically.
The following list discusses these methods in the order that Splunk typically uses them to assign source types to event data at index time:
1. Explicit source type specification based on the data input, as configured in inputs.conf stanzas:
[monitor://$PATH] sourcetype=$SOURCETYPE
2. Explicit source type specification based on the data source, as configured in props.conf stanzas:
[$SOURCE] sourcetype=$SOURCETYPE
3. Rule-based source type association:
Enables Splunk to match incoming data to source types using classification rules specified in rule:: stanzas in props.conf.
[rule::$RULE_NAME] sourcetype=$SOURCETYPE MORE_THAN_[0-100] = $REGEX LESS_THAN_[0-100] = $REGEX
For more information, see "Configure rule-based source type recognition" in this manual.
4. Automatic source type matching:
Splunk uses automatic source type recognition to match similar-looking files and, through that, assign a source type. It calculates signatures for patterns in the first few thousand lines of any file or stream of network input. These signatures identify things like repeating word patterns, punctuation patterns, line length, and so on. When Splunk calculates a signature, it compares it to previously seen signatures. If the signature appears to be a radically new pattern, Splunk creates a new source type for the pattern.
Note: At this stage in the source type assignation process, Splunk just matches incoming data with source types that it has learned previously. It doesn't create new source types for unique signatures until the final stage of source typing (step 6, below).
See "Pretrained source types" in this manual, for a list of the source types that Splunk is trained to recognize out of the box. See "Train Splunk's source type autoclassifier" for more information about expanding the list of source types that Splunk can assign through automatic source type recognition.
5. Delayed rule-based source type association:
This works like rule-based associations, except you create a delayedrule:: stanza in props.conf. This is a useful "catch-all" for source types, in case Splunk missed any with intelligent matching (see above).
A good use of delayed rule associations is for generic versions of very specific source types that are defined earlier with rule:: in step 3, above. For example, you could use rule:: to catch event data with specific syslog source types, such as "sendmail syslog" or "cisco syslog" and then have delayedrule:: apply the generic "syslog" source type to the remaining syslog event data.
[delayedrule::$RULE_NAME] sourcetype=$SOURCETYPE MORE_THAN_[0-100] = $REGEX LESS_THAN_[0-100] = $REGEX
For more information, see "Configure rule-based source type recognition" in this manual.
6. Automatic source type learning:
If Splunk is unable to assign a source type for the event using the preceding six methods, it creates a new source type for the event signature (see step 4, above). Splunk stores learned pattern information in sourcetypes.conf.
Configuration files for source types
Set source types for inputs in inputs.conf. Set source types for sources and rule-based source type associations in props.conf. Before manually modifying any configuration file, read about configuration files.
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.