Splunk® Enterprise

Knowledge Manager Manual

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

About Splunk Enterprise regular expressions

This topic is a brief primer to help you create valid regular expressions in Splunk Enterprise.

Regular expressions, or regexes, are a means to match patterns of characters in text. Splunk Enterprise uses regexes for extracting default fields, recognizing binary file types, and assigning source types automatically. Users can also use regexes when defining custom field extractions, filtering events, routing data, and correlating searches. Search commands that use regexes include rex and regex and eval functions, such as match and replace.

Splunk Enterprise regexes are PCRE, or perl-compatible regular expressions, and specifically use the PCRE C library.

Terminology and syntax

Term Description
literal The exact text of characters that you want to match using a regex.
regular expression, or regex The metacharacters that define the pattern used to match against the literal.
groups Regular expressions allow a variety of groupings indicated by the type of bracket used to enclose the regex characters. Generally, you use parentheses for match or capture groups, atomic groups, and lookarounds; square brackets to define character classes; curly brackets to define repetitions; angle brackets to define named capture groups; and double brackets for Splunk-Enterprise-specific modular regex expressions. Quantifiers can then be applied to the enclosed group and alternation can be used within the group.
character class Regex characters enclosed in square brackets and used to match a string. Define a range with a hyphen, such as [A-Z] to match any uppercase letter. Begin the character class with a caret (^) to define a negative match, such as [^A-Z] to match any non-uppercase letter.
character type Similar to a wildcard, character types represent shorthand for specific literal matches. For example, a period . matches any character, \w matches words or alphanumeric characters including an underscore, etc.
anchor These are character types that specifically match text formatting positions, such as return \r and newline \n, etc.
alternation Alternation refers to supplying alternate match patterns in the regex. A vertical bar or pipe character ( | ) is used to separate the alternate patterns, which can include full regular expressions. For example, grey|gray matches either "grey" or "gray".
quantifiers, or repetitions Quantifiers ( *, +, ? ) are used to define how to match the groups to the literal pattern. For example, * matches 0 or more, + matches 1 or more, and ? matches 0 or 1.
back references Back references are literal groups that can be recalled for later use. In Splunk Enterprise, indicate a back reference to the value with a dollar symbol ($) and a number (not zero).
lookarounds Lookarounds are another way to define a group to determine the position in a string. This definition matches the regex in the group but will gives up the match to keep the result; for example, you could use a lookaround to match x that is followed by y without matching y.

Character types

Character types are shorthand for literal matches.

Term Description
. Match any character.
\w Match "word" character (alphanumeric strings plus underscore, "_").
\W Match non-word character.
\s Match whitespace character.
\S Match non-whitespace character.
\d Match digit character.
\D Match non-digit character.

Groups and quantifiers

Regular expressions allow a variety of groupings indicated by the type of bracket used to enclose the regex characters. Quantifiers ( *, +, ? ) can then be applied to the enclosed group and alternation can be used within the group.

Term Description
( ) Parentheses are used to define match or capture groups, atomic groups, and lookarounds.
[ ] Square brackets are used to define character classes.
{ } Curly brackets are used to define repetitions.
< > Angle brackets to define named capture groups.
[[ ]] Double brackets are used to define Splunk-Enterprise-specific modular regex expressions.
* Matches the group 0 or more times.
+ Matches the group 1 or more times.
 ? Matches the group 0 or 1 time.

Regex examples

Example 1: This example shows two ways to match either "to" or "too". The first regex uses the ? quantifier to match up to 1 more "o" after the first. The second regex uses alternation to specify the pattern.

to(o)?
(to|too)

Modular regexes

Modular regexes refer to small chunks of regular expressions that are defined to be used in longer regular expression definitions. Modular regexes are defined in transforms.conf.

For example, you can define an integer and then use that regular expression definition to define a float:

[int]
# matches an integer or a hex number
REGEX = 0x[a-fA-F0-9]+|\d+

[float]
# matches a float (or an int)
REGEX = \d*\.\d+|[[int]]

Notice that the modular regex for is invoked with double square brackets, [[int]].

You can also use the modular regex in field extractions:

[octet] 
# this would match only numbers from 0-255 (one octet in an ip)
REGEX = (?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?)

[ipv4]
# matches a valid IPv4 optionally followed by :port_num the 
# octets in the ip would also be validated 0-255 range
# Extracts: ip, port
REGEX = (?<ip>[[octet]](?:\.[[octet]]){3})(?::[[int:port]])?
PREVIOUS
Use default fields
  NEXT
About event types

This documentation applies to the following versions of Splunk® Enterprise: 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18, 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14


Comments

How does one escape a reserved character

Fraud
November 20, 2013

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters