Splunk Cloud Platform

Getting Data In

Why source types matter

The source type is one of the default fields that the Splunk platform assigns to all incoming data. It tells the platform what kind of data you have, so that it can format the data intelligently during indexing. Source types also let you categorize your data for easier searching.

Source types determine how incoming data is formatted

Because the source type controls how the Splunk platform formats incoming data, it is important that you assign the correct source type to your data. That way, the indexed version of the data (the event data) looks the way you want, with appropriate timestamps and event breaks. This facilitates easier searching of the data later.

Splunk software comes with a large number of predefined source types. When consuming data, the Splunk platform usually selects the correct source type automatically. If you have specialized data, you might need to manually select a different predefined source type. If your data is unusual, you might need to create a new source type with customized event processing settings. And if your data source contains heterogeneous data, you might need to assign the source type on a per-event, rather than a per-source, basis.

Like any other field, you can also use the source type field to search event data after the data has been indexed. You use it a lot in your searches since the source type is a key way to categorize your data.

Common source types

Any common data input format can be a source type. Most source types are log formats. For example, some common source types that the Splunk platform automatically recognizes include the following:

Source type Description
access_combined For NCSA combined log format HTTP Web server logs.
apache_error For standard Apache Web server error logs.
cisco_syslog For the standard syslog produced by Cisco network devices (including PIX firewalls, routers, and ACS), usually using remote syslog to a central log host.
websphere_core A core file export from WebSphere.

For a complete list of predefined source types, see List of pretrained source types in this manual.

Configuring source types

There are two basic types of configuration you can do with source types:

  • Assign source types explicitly to your incoming data.
  • Create new source types, either from scratch or by modifying an existing source type.

Assign source types

In most cases, the Splunk platform determines the best source type for your data and automatically assigns it to incoming events. In some cases, however, you might need to explicitly assign a source type to your data. You usually do this when you define the data input. For details on how to improve source type assignment, see the following topics:

For more information about how the Splunk platform assigns source types, see How the Splunk platform assigns source types.

Create new source types

If none of the existing source types fits the needs of your data, create a new one.

Splunk Web lets you adjust source type settings to fit your data. In essence, it's a visual source type editor. See Use the Set Source Type page.

If you use Splunk Cloud Platform, use Splunk Web or Apps to define source types. If you use Splunk Enterprise, use Apps. See Create source types.

Preview data to test and modify source types

Splunk Web lets you review the effects of applying a source type to an input. It lets you preview the resulting events without actually committing them to an index. You can also edit timestamp and event breaking settings interactively and then save the modifications as a new source type. For information on how data preview functions as a source type editor, see Use the Set Source Type page.

Search on source types

sourcetype is the name of the source type search field. You can use the sourcetype field to find similar types of data from any source type. For example, you can search sourcetype=weblogic_stdout to find all of your WebLogic server events, even when WebLogic is logging from more than one domain, or host in Splunk terms.

How the Splunk platform assigns source types

The Splunk platform uses a variety of methods to assign source types to event data at index time. Both Splunk Cloud Platform and Splunk Enterprise perform these methods the same way. The difference is that, on Splunk Cloud Platform, you can only make changes to source type configurations by using Splunk Web or Apps to set inputs.conf and props.conf.

As the Splunk platform processes event data, it steps through these methods in a defined order of precedence. It starts with source type configurations that have been statically configured in the inputs.conf and props.conf configuration files, moves on to rule-based source type association, and then works through methods like automatic source type recognition and automatic source type learning. This range of methods enables you to configure how the Splunk platform applies source type values to specific kinds of events, while assigning source type values to other events automatically.

The following list shows how the Splunk platform goes about determining the source type for a data input. The Splunk platform starts with the first method and then descends through the others as necessary, until it can determine the source type.

Explicit source type specification based on the data input

If the Splunk platform finds an explicit source type for the data input, it stops here.

If you use Splunk Cloud Platform, you configure this in Splunk Web. If you use Splunk Enterprise, you configure this in either Splunk Web or in the inputs.conf configuration file. Here is the syntax for configuring the inputs.conf file to assign source types to a file input:

[monitor://<path>]
sourcetype=<sourcetype>

You can also assign a source type when defining an input in Splunk Web. For information on doing this for file inputs, see Monitor files and directories in Splunk Enterprise with Splunk Web in this manual. The process is similar for network or other types of inputs.

For more information, see Specify source type for an input.

Explicit source type specification based on the data source

If the Splunk platform finds an explicit source type for the particular data source, it stops here.

If you use Splunk Enterprise, or you want to use a heavy forwarder to forward this data to Splunk Cloud Platform, you can configure this in the props.conf configuration file, using the following syntax:

[source::<source>] 
sourcetype=<sourcetype>

For more information, see Specify source type for a source.

Rule-based source type recognition

The Splunk platform looks next for any rules that you have created for source types.

If you use Splunk Enterprise, you can create source type classification rules in the props.conf file. If you use Splunk Cloud, you can use Apps or the UI. Use the following syntax:

[rule::<rule_name>]
sourcetype=<sourcetype>
MORE_THAN_[0-100] = <regex>
LESS_THAN_[0-100] = <regex>

For information about setting up source type recognition rules, see Configure rule-based source type recognition.

Automatic source type matching

The Splunk platform next attempts to use automatic source type recognition to match similar-looking files and assign a source type.

The Splunk platform calculates signatures for patterns in the first few thousand lines of any file or network input stream. These signatures identify things like repeating word patterns, punctuation patterns, line length, and so on. When the Splunk platform calculates a signature, it compares it to its set of signatures for known, "pretrained" source types. If it identifies a match, it assigns that source type to the data.

See List of pretrained source types for a list of the source types that the Splunk platform can recognize by default.

Delayed rule-based source type association

If the Splunk platform hasn't identified a source type by now, it looks for any delayed rules.

This works like rule-based associations. If you use Splunk Enterprise, you can create a delayedrule:: stanza in the props.conf file. This is a useful catch-all for source types, in case Splunk Enterprise missed any with intelligent matching.

A good use of delayed rule associations is for generic versions of very specific source types that were defined earlier with rule:: in the rule-based step. For example, you could use rule:: to catch event data with specific syslog source types, such as "sendmail syslog" or "cisco syslog" and then have delayedrule:: apply the generic syslog source type to the remaining syslog event data.

Here is the syntax:

[delayedrule::$RULE_NAME]
sourcetype=$SOURCETYPE
MORE_THAN_[0-100] = $REGEX
LESS_THAN_[0-100] = $REGEX

For more information about setting up or removing delayed rules for source type recognition, see Configure rule-based source type recognition.

Automatic source type learning

If the Splunk platform is unable to assign a source type for the event using the preceding methods, it creates a new source type for the event signature. The Splunk platform stores learned pattern information in the sourcetypes.conf configuration file.

Last modified on 27 February, 2023
Change host values after indexing   Override automatic source type assignment

This documentation applies to the following versions of Splunk Cloud Platform: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters