Splunk® Data Stream Processor

Function Reference

Acrobat logo Download manual as PDF


On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see the Upgrade the Splunk Data Stream Processor topic.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Rex

This topic describes how to use the function in the .

Description

Extract or rename fields using regular expression named capture groups, or edit fields using a sed expression.

The rex function matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.

When mode=sed, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax can also be used to mask sensitive data.

For more information about regular expressions in the , see about regular expressions.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs the same collection of records but with a different schema S.

Syntax

The required fields are in bold.

If using regex:

rex
field=<field>
pattern=<regex-expression> [max_match=<int>] [offset_field=<string>]

If using sed:

rex
field=<field>
mode=sed <sed-expression>

You must specify either <pattern> or mode=sed <sed-expression> when you use the rex function.

Required arguments

field
Syntax: field=<field>
Description: The field that you want to extract information from.
Example in Canvas View: body
pattern
Syntax: regex string
Description: The Java regular expression (regex) or sed expression that defines the information to match and extract from the specified field. You must include a named capturing group in a regular expression pattern surrounded by forward slashes ( / ). If a match cannot be found, the new field is still added to your records but the value is set to null. Capturing groups can only contain alphanumeric characters. If you are using a sed expression, you must set mode=sed.
Example in Canvas View: /[(?<timestamp>\d+)].*/

Optional arguments

mode
Syntax: string
Description: Only required when you want to use a sed (UNIX stream editor) expression. Specify to indicate that you are using a sed expression.
Example in Canvas View: sed
max_match
Syntax: int
Description: Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields.
Default: 1. Use 0 for unlimited matches.
Example in Canvas View: 10
offset_field
Syntax: string
Description: The desired output field name. You can specify it either without quotes or with single quotes, such as offset_field=myfield or offset_field='myfield'. It is safer to use single quotes to avoid conflicts with reserved keywords in SPL such as offset. If you wanted to use a reserved keyword as a field name, you need to enclose that field name with single quotes, for example: offset_field='offset'. The value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is (?<tenchars>.{10}), this matches the first ten characters of the field, and the offset_field contents is 0-9.
Example in Canvas View: position

Usage

This section contains additional usage information about the Rex function.

Regular expressions

Unlike Splunk Enterprise, regular expressions used in the are Java regular expressions.

Using a sed expression

When using the rex function in sed mode, you have two options: replace (s) or character substitution (y).

The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"

  • <regex> is a Java regular expression, which can include capturing groups.
  • <replacement> is a string to replace the regex match. Use \n for backreferences, where "n" is a single digit.
  • <flags> can be either: g to replace all matches, or a number to replace a specified match.

The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"

  • This substitutes the characters that match <string1> with the characters in <string2>.

SPL2 examples

Examples of common use cases follow. The following examples in this section assume that you are in the SPL View.

When working in the SPL View you can write the function by providing the arguments in the exact order shown in each use case.

If you are using the SPL2 Pipeline Builder, you must escape any backslash ( \ ) characters. If you are using the Canvas Builder, backslash characters are automatically escaped. See Using regular expressions in the Canvas Builder vs the SPL2 Pipeline Builder.

1. Extract values from a field using a <regex-expression>

The following example extracts the first digit and the second digit from body into field0 and field1 respectively.

... | rex field=body "(?<field0>\\d+),(?<field1>\\d+)" | ...;

Incoming record

Record {
 body = "1,100"
 time = 123456789
}

Outgoing record:

Record {
 body = "1,100"
 time = 123456789,
 field0 = 1
 field1 = 100
}

2. Use a <sed-expression>

The following example uses a <sed-expression> to match the regex to a series of numbers and replace the numbers with the string "XXX". In this example the credit card number is anonymized.

... | rex field=body mode=sed "s/,\\d+,/,XXX,/g" | ...;

Incoming record

Record {
 body = "1,4222222222222,credit"
 timestamp = 123456789
}

Outgoing record

Record {
 body = "1,XXX,credit"
 timestamp = 123456789
}

3. Use sed to replace a particular position in the string field

The following example uses sed to replace a particular string at a fixed position using 0-based indexing. The 5th character in the body field, or position 4 using 0-based indexing, should be replaced with "X" if it matches the \d regular expression.

... | rex field=body mode=sed "s/\\d/X/4" |...;

Incoming record:

Record{
body="2,102"
}

Outgoing record:

Record{
body="2,10X"
}

4. Limit the number of matches with max_match

In the following examples, the max_match optional argument is invoked to limit the number of matches. If max_match > 1 or equal to 0 (unlimited), then it will create a multivalued (list) field in the outgoing record. The first example shows what would happen if max_match is set to 0. The second example shows what would happen if max_match is set to 1. The third example shows what would happen if max_match is set to 2.

... | rex max_match=0 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;
... | rex max_match=1 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;
... | rex max_match=2 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;

Incoming record

Record{
body="0,100,1,101,2,102"
}

Outgoing record with max_match=0

Record{
body="0,100,1,101,2,102" 
field0=["0", "1", "2"]
field1=["100", "101", "102"]
}

Outgoing record with max_match=1

Record{
body="0,100,1,101,2,102" 
field0="0"
field1="100"
}

Outgoing record with max_match=2

Record{
body="0,100,1,101,2,102", 
field0=["0", "1"], 
field1=["100", "101"]
}

5. Using offset_field to track which parts of the body field were extracted

The following example uses the offset_field argument to track which parts of the body field were extracted. In this example, the position of the extracted values are returned in a top-level field called position. The first character is in position 0.

... | rex offset_field=position field=body "(?<field0>\\d+)(,)(?<field1>\\d+)" |...;

Incoming records:

Record{
body="0,100"
}  

Outgoing records:

Record{
body="0,100"
position="field0=0-0&field1=2-4", 
field0="0",
 field1="100"
}  

6. Using offset_field to track which parts of the body field were extracted with unlimited matches

The following example uses the offset_field to track which parts of the body field were extracted and also changes the max_match field to support unlimited matches.

... | rex max_match=0 offset_field=position field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;

Incoming record:

Record{
body="8,108,9,109"
}

Outgoing record:

Record{
body="8,108,9,109",
position="field0=0-0&field1=2-4&field0=6-6&field1=8-10",
field0=["8", "9"], 
field1=["108", "109"]
}
Last modified on 25 March, 2022
PREVIOUS
Pairwise Categorical Outlier Detection (beta)
  NEXT
Rename

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0, 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters