Configure rule-based source type recognition
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Configure rule-based source type recognition
Configure rule-based source type recognition to expand the range of source types that Splunk recognizes. Splunk automatically assigns rule-based source types based on regular expressions you specify in props.conf.
You can create two kinds of rules in props.conf: rules and delayed rules. The only difference between the two is the point at which Splunk checks them during the source typing process. As it processes each string of event data, Splunk uses several methods to determine source types:
- After checking for explicit source type definitions based on the event data input or source, Splunk looks at the
rule::stanzas defined inprops.confand tries to match source types to the event data based on the classification rules specified in those stanzas. - If Splunk is unable to find a matching source type using the available
rule::stanzas, it tries to use automatic source type matching, where it tries to identify patterns similar to source types it has learned in the past. - When that method fails, Splunk then checks the
delayedrule::stanzas inprops.conf, and tries to match the event data to source types using the rules in those stanzas.
You can set up your system so that rule:: stanzas contain classification rules for specialized source types, while delayedrule:: stanzas contain classification rules for generic source types. This way the the generic source types are applied to broad ranges of events that haven't qualified for more specialized source types. For example, you could use rule:: stanzas to catch event data with specific syslog source types, such as sendmail_syslog or cisco_syslog and then have a delayedrule:: stanza apply the generic syslog source type to remaining syslog event data.
Configuration
To set source typing rules, edit props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application directory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see "About configuration files" in this manual.
Create a rule by adding a rule:: or delayedrule:: stanza to props.conf. Provide a name for the rule in the stanza header, and declare the source type name in the body of the stanza. After the source type declaration, list the the source type assignation rules. These rules use one or more MORE_THAN and LESS_THAN statements to find patterns in the event data that fit given regular expressions by specific percentages.
Note: You can specify any number of MORE_THAN and LESS_THAN statements in a source typing rule stanza. All of the statements must match a percentage of event data lines before those lines can be assigned the source type in question. For example, you could define a rule that assigns a specific source type value to event data where more than 10% match one regular expression and less than 10% match another regular expression.
Add the following to props.conf:
[rule::$RULE_NAME] OR [delayedrule::$RULE_NAME] sourcetype=$SOURCETYPE MORE_THAN_[0-100] = $REGEX LESS_THAN_[0-100] = $REGEX
The MORE_THAN and LESS_THAN numerical values refer the percentage of lines that contain the string specified by the regular expression. To match, a rule can be either MORE_THAN or LESS_THAN those percentages.
Note: For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can test regexes by using them in searches with the rex search command. Splunk also maintains a list of useful third-party tools for writing and testing regular expressions.
Examples
The following examples come from $SPLUNK_HOME/etc/system/default/props.conf.
Postfix syslog files
# postfix_syslog sourcetype rule
[rule::postfix_syslog]
sourcetype = postfix_syslog
# If 80% of lines match this regex, then it must be this type
MORE_THAN_80=^\w{3} +\d+ \d\d:\d\d:\d\d .* postfix(/\w+)?\[\d+\]:
Delayed rule for breakable text
# breaks text on ascii art and blank lines if more than 10% of lines have # ascii art or blank lines, and less than 10% have timestamps [delayedrule::breakable_text] sourcetype = breakable_text MORE_THAN_10 = (^(?:---|===|\*\*\*|___|=+=))|^\s*$ LESS_THAN_10 = [: ][012]?[0-9]:[0-5][0-9]
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.