Splunk® Enterprise

Getting Data In

Download manual as PDF

Download topic as PDF

Configure rule-based source type recognition

You can use rule-based source type recognition to expand the range of source types that Splunk software recognizes. In props.conf, you create a rule:: stanza that associates a specific source type with a set of qualifying criteria. When consuming data, Splunk software assigns the specified source type to file inputs that meet the rule's qualifications.

You can create two kinds of rules in props.conf: rules and delayed rules. The only difference between the two is the point at which Splunk software checks them during the source typing process. As it processes each set of incoming data, Splunk software uses several methods to determine source types:

  • After checking for explicit source type definitions based on the data input or source, Splunk software looks at any rule:: stanzas defined in props.conf and tries to match source types to the data based on the classification rules specified in those stanzas.
  • If Splunk software is unable to find a matching source type using the available rule:: stanzas, it tries to use automatic source type matching, where it tries to identify patterns similar to source types it has learned in the past.
  • If that method fails, Splunk software then checks any delayedrule:: stanzas in props.conf and tries to match the data to source types using the rules in those stanzas.

For details on the precedence rules that Splunk software uses to assign source types to data, read How Splunk software assigns source types.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test regular expressions by using them in searches with the rex search command and third-party tools for writing and testing regular expressions.

You can configure your system so that rule:: stanzas contain classification rules for specialized source types, while delayedrule:: stanzas contain classification rules for generic source types. That way, Splunk software applies the generic source types to broad ranges of events that haven't qualified for more specialized source types. For example, you could use rule:: stanzas to catch data with specific syslog source types, such as sendmail_syslog or cisco_syslog, and then configure a delayedrule:: stanza to apply the generic syslog source type to any remaining syslog data.

Configuration

To set source typing rules, edit props.conf in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/. For information on configuration files in general, see About configuration files in the Admin manual.

Create a rule by adding a rule:: or delayedrule:: stanza to props.conf. Provide a name for the rule in the stanza header, and declare the source type in the body of the stanza. After the source type declaration, list the source type assignment rules. These rules use one or more MORE_THAN and LESS_THAN statements to find patterns in the data that fit given regular expressions by specific percentages.

To create a rule, use this syntax:

[rule::<rule_name>] OR [delayedrule::<rule_name>]
sourcetype=<source_type>
MORE_THAN_[0-99] = <regex>
LESS_THAN_[1-100] = <regex>

You set a numerical value in the MORE_THAN and LESS_THAN attributes, corresponding to the percentage of input lines that must contain the string specified by the regular expression. For example, MORE_THAN_80 means at least 80% of the lines must contain the associated expression. LESS_THAN_20 means that less than 20% of the lines can contain the associated expression.

Note: Despite its nomenclature, the MORE_THAN_ attribute actually means "more than or equal to". Similarly the LESS_THAN_ attribute means "less than or equal to".

A rule can contain any number of MORE_THAN and/or LESS_THAN conditions.The rule's source type is assigned to a data file only if the data qualifies all the statements in the rule. For example, you could define a rule that assigns a specific source type to a file input only if more than 60% of the lines match one regular expression and less than 20% match another regular expression.

Examples

Postfix syslog files

# postfix_syslog sourcetype rule
[rule::postfix_syslog]
sourcetype = postfix_syslog
# If 80% of lines match this regex, then it must be this type
MORE_THAN_80=^\w{3} +\d+ \d\d:\d\d:\d\d .* postfix(/\w+)?\[\d+\]:

Delayed rule for breakable text

# breaks text on ascii art and blank lines if more than 10% of lines have
# ascii art or blank lines, and less than 10% have timestamps
[delayedrule::breakable_text]
sourcetype = breakable_text
MORE_THAN_10 = (^(?:---|===|\*\*\*|___|=+=))|^\s*$
LESS_THAN_10 = [: ][012]?[0-9]:[0-5][0-9]
PREVIOUS
Override automatic source type assignment
  NEXT
List of pretrained source types

This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.2.0, 7.2.1


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters