Splunk® Enterprise

Getting Data In

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Why source types matter (a lot)

The source type is one of the default fields that Splunk assigns to all incoming data. It tells Splunk what kind of data you've got, so that Splunk can format the data intelligently during indexing. And it's a way to categorize your data, so that you can search it easily.

The important thing about source types

Because Splunk uses the source type to decide how to format your data, It's extremely important that you assign the right source type to your data. That way, the indexed version of the data (the event data) will look the way you expect it to, with appropriate timestamps and event breaks. This will make it a lot easier to search your data later on.

For the most part, it's pretty easy to assign the right source type to your data. Splunk comes with a large number of predefined source types. When consuming data, Splunk will usually select the correct source type automatically. Sometimes, though, Splunk needs your help. If your data is specialized, you might need to manually select a different predefined source type. If your data is unusual, you might need to create an entirely new source type with customized event processing settings. And if your data source contains heterogeneous data, you might need to assign the source type on a per-event (rather than a per-source) basis.

Like any other field, you can also use the source type field to search event data, once the data has been indexed. You'll probably use it a lot in your searches, since the source type is a key way to categorize your data.

Typical source types

Any common data input format can be a source type. Most source types are log formats. For example, some common source types that Splunk automatically recognizes include:

  • access_combined, for NCSA combined format HTTP Web server logs.
  • apache_error, for standard Apache Web server error logs.
  • cisco_syslog, for the standard syslog produced by Cisco network devices (including PIX firewalls, routers, and ACS), usually via remote syslog to a central log host.
  • websphere_core, a core file export from WebSphere.

Note: For a longer list of source types that Splunk automatically recognizes, see "List of pretrained source types" in this manual.

Configure source types

There are two basic types of configuration you can do with source types:

  • Assign source types explicitly to your incoming data.
  • Create new source types, either from scratch or by modifying an existing source type.

Assign source types

In most cases, Splunk will determine the best source type for your data and automatically assign it to incoming events. In some cases, however, you might need to explicitly assign a source type to your data. You usually do this when defining the data input. For details on how to improve source type assignment, read these topics:

Later in this topic, there's a section that explains how Splunk assigns source types.

Create new source types

If none of the existing source types fits the needs of your data, you can create a new one.

Splunk's data preview feature provides an easy, UI-based method for adjusting source type settings to fit your data. In essence, it's a visual source type editor. For detailed information, see "Data preview and source types."

You can also create a new source type by directly editing props.conf and adding a source type stanza. To learn how to create a new source type, read "Create source types."

Use data preview to test and modify source types

The data preview feature in Splunk Web provides an easy way to view the effect of applying a source type to an input. It lets you preview the resulting events without actually committing them to an index. You can also use data preview to edit timestamp and event breaking settings interactively and then save the modifications as a new source type. For information on how data preview functions as a source type editor, see "Data preview and source types".

Search on source types

sourcetype is the name of the source type search field. You can use the sourcetype field to find similar types of data from any source type. For example, you could search sourcetype=weblogic_stdout to find all of your WebLogic server events, even when WebLogic is logging from more than one domain (or "host," in Splunk terms).

How Splunk assigns source types

Splunk employs a variety of methods to assign source types to event data at index time. As it processes event data, Splunk steps through these methods in a defined order of precedence. It starts with hardcoded source type configurations in inputs.conf and props.conf, moves on to rule-based source type association, and then works through methods like automatic source type recognition and automatic source type learning. This range of methods enables you to configure how Splunk applies source type values to specific kinds of events, while letting Splunk assign source type values to other events automatically.

The following list shows how Splunk goes about determining the source type for a data input. Splunk starts with the first method and then descends through the others as necessary, until it's able to determine the source type. The list also provides an overview on how you configure source type assignment for each level.

1. Explicit source type specification based on the data input

If Splunk finds an explicit source type for the data input, it stops here.

You configure this in inputs.conf or Splunk Web. Here's the inputs.conf syntax for assigning source types to a file input:

[monitor://<path>]
sourcetype=<sourcetype>

You can also assign a source type when defining an input in Splunk Web. For information on doing this for file inputs, see "Use Splunk Web" in this manual. The process is similar for network or other types of inputs.

For more information, see "Specify source type for an input".

2. Explicit source type specification based on the data source

If Splunk finds an explicit source type for the particular source, it stops here.

You configure this in props.conf, using this syntax:

[source::<source>] 
sourcetype=<sourcetype>

For more information, see "Specify source type for a source".

3. Rule-based source type recognition

Splunk looks next for any rules you've created for source types.

You can create source type classification rules in props.conf:

[rule::<rule_name>]
sourcetype=<sourcetype>
MORE_THAN_[0-100] = <regex>
LESS_THAN_[0-100] = <regex>

For information about setting up source type recognition rules, see "Configure rule-based source type recognition".

4. Automatic source type matching

Splunk next attempts to use automatic source type recognition to match similar-looking files and assign a source type.

Splunk calculates signatures for patterns in the first few thousand lines of any file or network input stream. These signatures identify things like repeating word patterns, punctuation patterns, line length, and so on. When Splunk calculates a signature, it compares it to its set of signatures for known, "pretrained" source types. If it identifies a match, it assigns that source type to the data.

See "List of pretrained source types" in this manual for a list of the source types that Splunk can recognize out of the box.

5. Delayed rule-based source type association

If Splunk hasn't identified a source type by now, it looks for any delayed rules.

This works like rule-based associations (step 3, above). You create a delayedrule:: stanza in props.conf. This is a useful "catch-all" for source types, in case Splunk missed any with intelligent matching (see above).

A good use of delayed rule associations is for generic versions of very specific source types that were defined earlier with rule:: in step 3, above. For example, you could use rule:: to catch event data with specific syslog source types, such as "sendmail syslog" or "cisco syslog" and then have delayedrule:: apply the generic "syslog" source type to the remaining syslog event data.

Here's the syntax:

[delayedrule::$RULE_NAME]
sourcetype=$SOURCETYPE
MORE_THAN_[0-100] = $REGEX
LESS_THAN_[0-100] = $REGEX

For more information about settting up or removing delayed rules for source type recognition, see "Configure rule-based source type recognition".

6. Automatic source type learning

If Splunk is unable to assign a source type for the event using the preceding methods, it creates a new source type for the event signature (see step 4, above). Splunk stores learned pattern information in sourcetypes.conf.

PREVIOUS
Handle incorrectly-assigned host values
  NEXT
Override automatic source type assignment

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18


Comments

@Davehocking - yes! host, source and sourcetype, along with the data and some other info, is sent from the forwarder to the indexer

Lguinn
November 27, 2013

Once a sourcetype has been set, I presume that if this is then forwarded, this sourcetype categorisation follows the data to the indexer?

Davehocking
July 31, 2013

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters