Splunk® Data Stream Processor

Use the Data Stream Processor

DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

About regular expressions

Unlike Splunk Enterprise, the uses Java 8 regular expressions for its functions.

To learn more about Java 8 regular expressions and the differences between Java 8 regular expressions and PCRE regular expressions, see the Java 8 regular expressions page in the Oracle documentation.

Regular expressions terminology and syntax

The following table describes common regular expressions terminology and syntax, and is not an exhaustive list. See Java 8 regular expressions on the Oracle documentation for a full list of terminology and syntax.

Term Description
literal The exact text of characters to match using a regular expression.
regular expression The metacharacters that define the pattern that Splunk software uses to match against the literal.
groups Regular expressions allow groupings indicated by the type of bracket used to enclose the regular expression characters. Groups can define character classes, repetition matches, named capture groups, modular regular expressions, and more. You can apply quantifiers to and use alternation within enclosed groups. For example, see the section "Groups, quantifiers, and alternation" or "Capture groups in regular expressions".
character class A character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string. To set up a character class, define a range with a hyphen, such as [A-Z], to match any uppercase letter. Begin the character class with a caret (^) to define a negative match, such as [^A-Z] to match any lowercase letter.
character type Similar to a wildcard, character types represent specific literal matches. For example, a period . matches any character, \w matches words or alphanumeric characters including an underscore, and so on.
anchor Character types that define text formatting positions, such as return (\r) and newline (\n). Anchors assert that the engine's current position in the string matches a well-determined location. For example, the beginning of a string or the end of a line. To assert linebreak characters, use the multiline modifier flag, described in the " regular expression modifier flags" section.
alternation Refers to supplying alternate match patterns in the regular expression. Use a vertical bar or pipe character ( | ) to separate the alternate patterns, which can include full regular expressions. For example, grey|gray matches either grey or gray.
quantifiers, or repetitions Use ( *, +, ? ) to define how to match the groups to the literal pattern. For example, * matches 0 or more, + matches 1 or more, and ? matches 0 or 1.
backreferences Literal groups that you can recall for later use. To indicate a backreference to the value, provide a backslash symbol (\) and a positive number. For example, (\d\d)\1 matches two digits repeated twice and \1 refers to the matched group. Therefore, this regular expression matches the strings that look like "abab", where a and b are both digits.
lookarounds Match characters, but then gives up the match without consuming them. Lookarounds are zero-length assertions, similar to start/end of string, but they match the given characters. You can use (?...), in combination with appropriate indicators, to specify a lookaround. Examples:
  • abc(?=d) matches "abc" only if followed by a "d".
  • abc(?!d) matches "abc" only if not followed by a "d".
  • (?<=d)abc matches "abc" only if prefixed by a "d".
  • (?<!d)abc matches "abc" only if not prefixed by a "d".

Character types

Character types are short for literal matches. For more information about character types, see Java 8 regular expressions.

Term Description Example Explanation
\w Match a word character (a letter, number, or underscore character). \w\w\w Matches any three word characters.
\W Match a non-word character. \W\W\W Matches any three non-word characters.
\d Match a digit character. \d\d\d-\d\d-\d\d\d\d Matches a Social Security number, or a similar 3-2-4 number string.
\D Match a non-digit character. \D\D\D Matches any three non-digit characters.
\s Match a whitespace character. \d\s\d Matches a sequence of a digit, a whitespace, and then another digit.
\S Match a non-whitespace character. \d\S\d Matches a sequence of a digit, a non-whitespace character, and another digit.
. Match any character. Use sparingly. \d\d.\d\d.\d\d Matches a date string such as 12/31/14 or 01.01.15, but can also match 99A99B99.

Groups, quantifiers, and alternation

Regular expressions allow groupings indicated by the type of bracket used to enclose the regular expression characters. You can apply quantifiers ( *, +, ?) to the enclosed group and use alternation within the group. For more information about groups and quantifiers, see Java 8 regular expressions.

Term Description Example Explanation
* Match zero or more times. \w* Matches zero or more word characters.
+ Match one or more times. \d+ Match at least one digit.
? Match zero or one time. \d\d\d-?\d\d-?\d\d\d\d Matches a Social Security Number with or without dashes.
( ) Parentheses define match or capture groups, atomic groups, and lookarounds. (H..).(o..) When given the string Hello World, this matches Hel and o W.
[ ] Square brackets define character classes. [a-z0-9#] Matches any character that is a through z, 0 through 9, or #.
{ } Curly brackets define repetitions. \d{3,5} Matches a string of 3 to 5 digits in length.
< > Angle brackets define named capture groups. Use the syntax (?<var> ...) to set up a named field extraction. See the "Capture groups in regular expressions" section, on this page, for more information. (?<ssn>\d\d\d-\d\d-\d\d\d\d) Pulls out a Social Security Number and assigns it to the ssn field.

Using regular expressions in the Canvas Builder vs the SPL2 Pipeline Builder

Regular expressions make liberal use of the backslash character. In the SPL2 Pipeline Builder, you must represent the regex as a string directly, and therefore, the backslash literal in strings need to be written as \\. In the Canvas Builder, string fields are automatically escaped, so the backslash character should be entered without escaping. For example, the regular expression \d should be entered as \d in the canvas builder, but written as \\d in the SPL2 Builder.

Capture groups in regular expressions

A named capture group is a regular expression grouping that extracts a field value when a regular expression matches an event. Capture groups include the name of the field and are notated with angle brackets as follows:

some text (?<fieldName>regular expression capture pattern) more text

After a capture group is defined within a function, a map of all extracted, matched fields is returned in the format: {"capture_group_1": "matching_expression_1", "capture_group_N":"matching_expression_N"}. If you do not name the capturing group, the group names are returned as "1", "2", "3", "N", etc.

Underscores are not supported in capture group names, for example, <ip_address> is an invalid capturing group name.

For example, if you have this event text, and you want to extract the ip address from the event.

131.253.24.135 fail admin_user

You can use the following regular expression and capturing groups to extract the ip address from the event.
(?<ip>\d+\.\d+\.\d+\.\d+)

This returns a map with the key ip whose value is the value of the extracted capture group.
This screen image shows the ip address being extracted using a named capturing group.

For a non-named capture group, a function with the regex (\d+\.\d+\.\d+\.\d+) will return a map with key 1 whose value is the value of the extracted capture group.
This screen image shows the ip address being extracted using an unnamed capturing group.

regular expression modifier flags

The following regular expression modifier flags are available. Use a modifier flag to update the default regular expression behavior. To use a modifier, place the desired modifier at the beginning of the regular expression pattern in the format /(?MODIFIER)my regular expression/ or at the end of your regular expression pattern in the format /my regular expression/MODIFIER.

Modifier Description
c CANON_EQ. Enables canonical equivalence.
d UNIX_LINES. Enables Unix lines mode.
i CASE_INSENSITIVE. Enables case-insensitive matching.
l LITERAL. Enables literal parsing of the pattern.
m MULTILINE. Enables multiline mode.
s DOTALL. Enables dotall mode.
u UNICODE_CASE. Enables Unicode-aware case folding.
U UNICODE_CHARACTER_CASE. Enables the Unicode version of Predefined character classes and POSIX character classes.
x COMMENTS. Permits whitespace and comments in pattern.

As an example, if you have the following body text: Jul 20 17:07:55 93.227.214.209 %ASA-6-302014: Teardown TCP connection 304488019 for Outside:151.185.159.199/50867(LOCALxNora) to Inside:10.179.121.51/88 duration 0:00:00 bytes 4514 TCP FINs

You can use the following regular expressions and capture flag to extract the ASA field from that text.

  • /(?i)(?<ASA>ASA-\d-\d{6})/
  • /(?<ASA>ASA-\d-\d{6})/i

The first regular expression uses the i modifier flag to enable case-insensitive matching. The pattern-matching characters used for the named capturing group ASAare specific. \d means "digit" and {6} means "match a string 6 digits in length".

The capture group for ASA wants to match the characters "ASA", followed by a dash, followed by one digit, followed by another dash, and then followed by six digits. This describes the syntax for an Cisco Syslog ASA message.

The second regular expression does the same as the first, except the modifier flag is placed at the end instead of the beginning.

See more

Visit the following pages for resources on how to write Java 8 regular expressions.

Last modified on 09 March, 2022
data types   Navigating the

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters