About default fields (host, source, sourcetype, and more)
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
- Defining host, source, and sourcetype
- Under what conditions should you override host and sourcetype assignment?
- About hosts
- How Splunk assigns the host value
- Set a default host for a file or directory input
- Override default host values based on event data
- Tag host values
- About source types
- Source vs source type
- Methods Splunk uses for source type assignation and their precedence
About default fields (host, source, sourcetype, and more)
If you've read the "Configure event processing" chapter in this manual, you know that Splunk automatically extracts a number of default fields for each event it processes prior to indexing. These default fields include index, which identifies the index in which the related event is located, linecount, which describes the number of lines the related event contains, and timestamp, which describes the point in time at which an event occurred. (As discussed in the "Configure event timestamping" chapter, you have a number of options when it comes to changing the manner in which sets of events are timestamped.)
Note: For a complete list of the default fields that Splunk identifies for each event prior to indexing, see "Use default fields" in the User manual.
This chapter focuses mainly on three important default fields: host, source, and sourcetype. Splunk identifies host, source, and sourcetype values for each event it processes. This chapter explains how Splunk does this, and shows you how you can override the automatic assignment of host and sourcetype values for events when it is necessary to do so.
This chapter also shows you how to have Splunk extract additional, custom fields at index time. It's important to note that this practice is strongly discouraged, however. Adding to the list of indexed fields can negatively impact indexing and search speed. In addition, it may require you to reindex your entire dataset in order to have those fields show up for previously indexed events. It's best to extract fields at search time whenever possible. For more information, see "Index time versus search time" in this manual.
For more information about index-time field extraction, see "Configure index-time field extraction" in this manual.
Defining host, source, and sourcetype
The host, source, and sourcetype fields are defined as follows:
-
host- An event'shostvalue is typically the hostname, IP address, or fully qualified domain name of the network host from which the event originated. The host value enables you to easily locate data originating from a specific device. For an overview of the methods Splunk provides for the override of automatic host assignment, see the "Host field overview" in this topic in this manual. -
source- Thesourceof an event is the name of the file, stream, or other input from which the event originates. For data monitored from files and directories, the value of source is the full path, such as/archive/server1/var/log/messages.0or/var/log/. The value of source for network-based data sources is the protocol and port, such as UDP:514. -
sourcetype- The source type of an event is the format of the data input from which it originates, such asaccess_combinedorcisco_syslog. For an overview of how Splunk sets the source type value and the ways you can override automatic source type assignment, see the "Override automatic source type assignment" topic in this manual.
Under what conditions should you override host and sourcetype assignment?
Much of the time, Splunk can automatically identify host and sourcetype values that are both correct and useful. But situations do come up that require you to intervene in this process and provide override values.
You may want to change your default host assignment when:
- you are bulk-loading archive data that was originally generated from a different host and you want those events to have that host value.
- your data is actually being forwarded from a different host (the forwarder will be the host unless you specify otherwise).
- you are working with a centralized log server environment, which means that all of the data received from that server will have the same host even though it originated elsewhere.
You may want to change your default sourcetype assignment when:
- you want to give all event data coming through a particular input or from a specific source the same source type, for tracking purposes.
- you want to apply source types to specific events coming through a particular input, such as events that originate from a discrete group of hosts, or even events that are associated with a particular IP address or userid.
There are also steps you can take to expand the range of source types that Splunk automatically recognizes, or to simply rename source types. See the "Source type field overview" section, below, for more information.
About hosts
An event's host field value is the name of the physical device from which the event originates. Because it is a default field, which means that Splunk assigns it to every event it indexes, you use it to search for all events that have been generated by a particular host.
The host value can be an IP address, device hostname, or a fully qualified domain name, depending on whether the event was received through a file input, network input, or the computer hosting the instance of Splunk.
How Splunk assigns the host value
If no other host rules are specified for a source, Splunk assigns host a default value that applies to all data coming from inputs on a given Splunk server. The default host value is the hostname or IP address of the network host. When Splunk is running on the server where the event occurred (which is the most common case) this is correct and no manual intervention is required.
For more information, see "Set a default host for a Splunk server" in this manual.
Set a default host for a file or directory input
If you are running Splunk on a central log archive, or you are working with files forwarded from other hosts in your environment, you may need to override the default host assignment for events coming from particular inputs.
There are two methods for assigning a host value to data received through a particular input. You can define a static host value for all data coming through a specific input, or you can have Splunk dynamically assign a host value to a portion of the path or filename of the source. The latter method can be helpful when you have a directory structure that segregates each host's log archive in a different subdirectory.
For more information, see "Set a default host for a file or directory input" in this manual.
Override default host values based on event data
You may have a situation that requires you to override host values based on event data. For example, if you work in a centralized log server environment, you may have several host servers that feed into that main log server. The central log server is called the reporting host. The system where the event occurred is called the originating host (or just the host). In these cases you need to define rules that override the automatic host assignments for events received from that centralized log host and replace them with distinct originating host values.
For more information, see "Override default host values based on event data" in this manual.
Tag host values
Tag host values to aid in the execution of robust searches. Tags enable you to cluster groups of hosts into useful, searchable categories.
For more information, see "About tags and aliases" in the Knowledge Manager manual.
About source types
Any common data input format can be a source type. Most source types are log formats. For example, the list of common source types that Splunk automatically recognizes includes:
- access_combined, for NCSA combined format HTTP Web server logs.
- apache_error, for standard Apache Web server error logs.
- cisco_syslog, for the standard syslog produced by Cisco network devices including PIX firewalls, routers, ACS, etc., usually via remote syslog to a central log host.
- websphere_core, which is a core file export from WebSphere.
Note: For a longer list of source types that Splunk automatically recognizes, see "List of pretrained sourcetypes" in this manual.
sourcetype is the name of the source type field. You can use the sourcetype field to find similar types of data from any source type. For example, you could search sourcetype=weblogic_stdout to find all of your WebLogic server events even when WebLogic is logging from more than one domain (or "host," in Splunk terms).
Source vs source type
The source is the name of the file, stream, or other input from which a particular event originates. For data monitored from files and directories, the value of source is the full path, such as /archive/server1/var/log/messages.0 or /var/log/. The value of source for network-based data sources is the protocol and port, such as UDP:514.
Events with the same source type can come from different sources. For example, say you're monitoring source=/var/log/messages and receiving direct syslog input from udp:514. If you search sourcetype=linux_syslog, Splunk will return events from both of those sources.
Methods Splunk uses for source type assignation and their precedence
Splunk employs a variety of methods to assign source types to event data at index time. As it processes event data, Splunk steps through these methods in a defined order of precedence. It starts with hardcoded source type configurations in inputs.conf and props.conf, moves on to rule-based source type association, and then works through methods like automatic source type recognition and automatic source type learning. This range of methods enables you to configure how Splunk applies source type values to specific kinds of events, while letting Splunk assign source type values to the remaining events automatically.
The following list discusses these methods in the order that Splunk typically uses them to assign source types to event data at index time:
1. Explicit source type specification based on the data input, as configured in inputs.conf stanzas:
[monitor://$PATH] sourcetype=$SOURCETYPE
2. Explicit source type specification based on the data source, as configured in props.conf stanzas:
[source::$SOURCE] sourcetype=$SOURCETYPE
3. Rule-based source type recognition:
Enables Splunk to match incoming data to source types using classification rules specified in rule:: stanzas in props.conf.
[rule::$RULE_NAME] sourcetype=$SOURCETYPE MORE_THAN_[0-100] = $REGEX LESS_THAN_[0-100] = $REGEX
For information about setting up or removing source type recognition rules, see "Configure rule-based source type recognition" in this manual.
4. Automatic source type matching:
Splunk uses automatic source type recognition to match similar-looking files and, through that, assign a source type. It calculates signatures for patterns in the first few thousand lines of any file or stream of network input. These signatures identify things like repeating word patterns, punctuation patterns, line length, and so on. When Splunk calculates a signature, it compares it to previously seen signatures. If the signature appears to be a radically new pattern, Splunk creates a new source type for the pattern.
Note: At this stage in the source type assignation process, Splunk just matches incoming data with source types that it has learned previously. It doesn't create new source types for unique signatures until the final stage of source typing (step 6, below).
See "List of pretrained source types" in this manual for a list of the source types that Splunk can recognize out of the box.
5. Delayed rule-based source type association:
This works like rule-based associations (see above), except you create a delayedrule:: stanza in props.conf. This is a useful "catch-all" for source types, in case Splunk missed any with intelligent matching (see above).
A good use of delayed rule associations is for generic versions of very specific source types that are defined earlier with rule:: in step 3, above. For example, you could use rule:: to catch event data with specific syslog source types, such as "sendmail syslog" or "cisco syslog" and then have delayedrule:: apply the generic "syslog" source type to the remaining syslog event data.
[delayedrule::$RULE_NAME] sourcetype=$SOURCETYPE MORE_THAN_[0-100] = $REGEX LESS_THAN_[0-100] = $REGEX
For more information about settting up or removing delayed rules for source type recognition, see "Configure rule-based source type recognition" in this manual.
6. Automatic source type learning:
If Splunk is unable to assign a source type for the event using the preceding five methods, it creates a new source type for the event signature (see step 4, above). Splunk stores learned pattern information in sourcetypes.conf.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.