About Splunk regular expressions
This primer helps you create valid regular expressions. For a discussion of regular expression syntax and usage, see an online resource such as www.regular-expressions.info or a manual on the subject.
Regular expressions match patterns of characters in text and are used for extracting default fields, recognizing binary file types, and automatic assignation of source types. You also use regular expressions when you define custom field extractions, filter events, route data, and correlate searches. Search commands that use regular expressions include rex and evaluation functions such as
replace. See Quick Reference for SPL2 eval functions in the SPL2 Search Reference.
Splunk regular expressions are PCRE (Perl Compatible Regular Expressions) and use the PCRE C library.
The Splunk platform includes the license for PCRE2, an improved version of PCRE. However, the Splunk platform does not currently allow access to functions specific to PCRE2, such as key substitution.
Regular expressions terminology and syntax
|literal||The exact text of characters to match using a regular expression.|
|regular expression||The metacharacters that define the pattern that Splunk software uses to match against the literal.|
|groups||Regular expressions allow groupings indicated by the type of bracket used to enclose the regular expression characters. Groups can define character classes, repetition matches, named capture groups, modular regular expressions, and more. You can apply quantifiers to and use alternation within enclosed groups.|
|character class||Characters enclosed in square brackets. Used to match a string. To set up a character class, define a range with a hyphen, such as |
|character type||Similar to a wildcard, character types represent specific literal matches. For example, a period |
|anchor||Character types that match text formatting positions, such as return (|
|alternation||Refers to supplying alternate match patterns in the regular expression. Use a vertical bar or pipe character ( | ) to separate the alternate patterns, which can include full regular expressions. For example, |
|quantifiers, or repetitions||Use (|
|back references||Literal groups that you can recall for later use. To indicate a back reference to the value, specify a dollar symbol (|
|lookarounds||A way to define a group to determine the position in a string. This definition matches the regular expression in the group but gives up the match to keep the result. For example, use a lookaround to match |
Character types are short for literal matches.
||Match a word character (a letter, number, or underscore character).||
||Matches any three word characters.|
||Match a non-word character.||
||Matches any three non-word characters.|
||Match a digit character.||
||Matches a Social Security number, or a similar 3-2-4 number string.|
||Match a non-digit character.||\D\D\D||Matches any three non-digit characters.|
||Match a whitespace character.||
||Matches a sequence of a digit, a whitespace, and then another digit.|
||Match a non-whitespace character.||
||Matches a sequence of a digit, a non-whitespace character, and another digit.|
||Match any character. Use sparingly.||
||Matches a date string such as 12/31/14 or 01.01.15, but can also match 99A99B99.|
Groups, quantifiers, and alternation
Regular expressions allow groupings indicated by the type of bracket used to enclose the regular expression characters. You can apply quantifiers (
*, +, ? ) to the enclosed group and use alternation within the group.
||Match zero or more times.||
||Matches zero or more word characters.|
||Match one or more times.||
||Match at least one digit.|
||Match zero or one time.||
||Matches a Social Security Number with or without dashes.|
||Parentheses define match or capture groups, atomic groups, and lookarounds.||
||When given the string |
||Square brackets define character classes.||
||Matches any character that is |
||Curly brackets define repetitions.||
||Matches a string of 3 to 5 digits in length.|
||Angle brackets define named capture groups. Use the syntax
||Pulls out a Social Security Number and assigns it to the |
||Double brackets define Splunk-specific modular regular expressions.||
||A validated 0-255 range integer.|
A simple example of groups, quantifiers, and alternation
This example shows two ways to match either
The first regular expression uses the
? quantifier to match up to one more "o" after the first.
The second regular expression uses alternation to specify the pattern.
Capture groups in regular expressions
A named capture group is a regular expression grouping that extracts a field value when regular expression matches an event. Capture groups include the name of the field. They are notated with angle brackets as follows:
matching text (?<field_name>capture pattern) more matching text.
For example, you have this event text:
126.96.36.199 fail admin_user
Here are two regular expressions that use different syntax in their capturing groups to pull the same set of fields from that event.
- Expression A:
(?<ip>\d+\.\d+\.\d+\.\d+) (?<result>\w+) (?<user>.*)
- Expression B:
(?<ip>\S+) (?<result>\S+) (?<user>\S+)
In Expression A, the pattern-matching characters used for the first capture group (
ip) are specific.
\d means "digit" and
+ means "one or more." So
\d+ means "one or more digits."
\. refers to a period.
The capture group for
ip wants to match one or more digits, followed by a period, followed by one or more digits, followed by a period, followed by one or more digits, followed by a period, followed by one or more digits. This describes the syntax for an ip address.
The second capture group in Expression A for the
result field has the pattern
\w+, which means "one or more alphanumeric characters." The third capture group in Expression A for the
user field has the pattern
.*, which means "match everything that's left."
Expression B uses a common technique called negative matching. With negative matching, the regular expression does not try to define which text to match. Instead it defines what the text is not. In this Expression B, the values that should be extracted from the sample event are "not space" characters (
\S). It uses the
+ to specify "one or more" of the "not space" characters.
So Expression B says:
- Pull out the first string of not-space characters for the
- Ignore the following space.
- Then pull out the second string of not-space characters for the
- Ignore the second space.
- Pull out the third string of not-space characters for the
Non-capturing group matching
Use the syntax
(?: ... ) to create groups that are matched but which are not captured. Note that here you do not need to include a field name in angle brackets. The colon character after the
? character is what identifies it as a non-capturing group.
(?:Buttercup|Ponies) matches either
Ponies, but neither string is captured.
- Related information
- SPL2 and regular expressions
SPL2 and regular expressions
This documentation applies to the following versions of Splunk® Cloud Services: current
Feedback submitted, thanks!