Getting Data In

 


Configure indexed field extraction

About default fields (host, source, sourcetype, and more)

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

About default fields (host, source, sourcetype, and more)

If you've read the "Configure event processing" chapter in this manual, you know that Splunk automatically extracts a number of default fields for each event it processes prior to indexing. These default fields include index, which identifies the index in which the related event is located, linecount, which describes the number of lines the related event contains, and timestamp, which describes the point in time at which an event occurred. (As discussed in the "Configure event timestamping" chapter, you have a number of options when it comes to changing the manner in which sets of events are timestamped.)

Note: For a complete list of the default fields that Splunk identifies for each event prior to indexing, see "Use default fields" in the User manual.

This chapter focuses mainly on three important default fields: host, source, and sourcetype. Splunk identifies host, source, and sourcetype values for each event it processes. This chapter explains how Splunk does this. Later chapters show you how to override the automatic assignment of host and sourcetype values for events when it is necessary to do so.

This chapter also shows you how to have Splunk extract additional, custom fields at index time. It's important to note that this practice is strongly discouraged, however. Adding to the list of indexed fields can negatively impact indexing and search speed. In addition, it may require you to reindex your entire dataset in order to have those fields show up for previously indexed events. It's best to extract fields at search time whenever possible. For more information, see "Index time versus search time" in the Admin manual.

For more information about index-time field extraction, see "Create custom fields at index-time" in this manual.

Defining host, source, and sourcetype

The host, source, and sourcetype fields are defined as follows:

  • host - An event's host value is typically the hostname, IP address, or fully qualified domain name of the network host from which the event originated. The host value enables you to easily locate data originating from a specific device. For an overview of the methods Splunk provides for the override of automatic host assignment, see "About hosts".
  • source - The source of an event is the name of the file, stream, or other input from which the event originates. For data monitored from files and directories, the value of source is the full path, such as /archive/server1/var/log/messages.0 or /var/log/. The value of source for network-based data sources is the protocol and port, such as UDP:514.
  • sourcetype - The source type of an event is the format of the data input from which it originates, such as access_combined or cisco_syslog. For more information on source types, see "Why source types matter".

Under what conditions should you override host and sourcetype assignment?

Much of the time, Splunk can automatically identify host and sourcetype values that are both correct and useful. But situations do come up that require you to intervene in this process and provide override values.

You may want to change your default host assignment when:

  • you are bulk-loading archive data that was originally generated from a different host and you want those events to have that host value.
  • your data is actually being forwarded from a different host (the forwarder will be the host unless you specify otherwise).
  • you are working with a centralized log server environment, which means that all of the data received from that server will have the same host even though it originated elsewhere.

You may want to change your default sourcetype assignment when:

  • you want to give all event data coming through a particular input or from a specific source the same source type, for tracking purposes.
  • you want to apply source types to specific events coming through a particular input, such as events that originate from a discrete group of hosts, or even events that are associated with a particular IP address or userid.

There are also steps you can take to expand the range of source types that Splunk automatically recognizes, or to simply rename source types. For more information, see "Why source types matter".

About hosts

For detailed information about hosts, see "About hosts".

About source types

For detailed information about source types, see "Why source types matter".

This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!