Rex

This topic describes how to use the function in the .

Description

Extract or rename fields using regular expression named capture groups, or edit fields using a sed expression.

The rex function matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.

When mode=sed, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax can also be used to mask sensitive data.

For more information about regular expressions in the , see about regular expressions.

Function Input/Output Schema

Function Input: collection<record<R>>; This function takes in collections of records with schema R.
Function Output: collection<record<S>>; This function outputs the same collection of records but with a different schema S.

Syntax

The required fields are in bold.

If using regex:

rex: field=<field>; pattern=<regex-expression> [max_match=<int>] [offset_field=<string>]

If using sed:

rex: field=<field>; mode=sed <sed-expression>

You must specify either <pattern> or mode=sed <sed-expression> when you use the rex function.

Required arguments

field: Syntax: field=<field>; Description: The field that you want to extract information from.; Example in Canvas View: body

pattern: Syntax: regex string; Description: The Java regular expression (regex) or sed expression that defines the information to match and extract from the specified field. You must include a named capturing group in a regular expression pattern surrounded by forward slashes ( / ). If a match cannot be found, the new field is still added to your records but the value is set to null. Capturing groups can only contain alphanumeric characters. If you are using a sed expression, you must set mode=sed.; Example in Canvas View: /[(?<timestamp>\d+)].*/

Optional arguments

mode: Syntax: string; Description: Only required when you want to use a sed (UNIX stream editor) expression. Specify to indicate that you are using a sed expression.; Example in Canvas View: sed

max_match: Syntax: int; Description: Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields.; Default: 1. Use 0 for unlimited matches.; Example in Canvas View: 10

offset_field: Syntax: string; Description: The desired output field name. You can specify it either without quotes or with single quotes, such as offset_field=myfield or offset_field='myfield'. It is safer to use single quotes to avoid conflicts with reserved keywords in SPL such as offset. If you wanted to use a reserved keyword as a field name, you need to enclose that field name with single quotes, for example: offset_field='offset'. The value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is (?<tenchars>.{10}), this matches the first ten characters of the field, and the offset_field contents is 0-9.; Example in Canvas View: position

Usage

This section contains additional usage information about the Rex function.

Regular expressions

Unlike Splunk Enterprise, regular expressions used in the are Java regular expressions.

Using a sed expression

When using the rex function in sed mode, you have two options: replace (s) or character substitution (y).

The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"

<regex> is a Java regular expression, which can include capturing groups.
<replacement> is a string to replace the regex match. Use \n for backreferences, where "n" is a single digit.
<flags> can be either: g to replace all matches, or a number to replace a specified match.

The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"

This substitutes the characters that match <string1> with the characters in <string2>.

SPL2 examples

Examples of common use cases follow. The following examples in this section assume that you are in the SPL View.

When working in the SPL View you can write the function by providing the arguments in the exact order shown in each use case.

If you are using the SPL2 Pipeline Builder, you must escape any backslash ( \ ) characters. If you are using the Canvas Builder, backslash characters are automatically escaped. See Using regular expressions in the Canvas Builder vs the SPL2 Pipeline Builder.

1. Extract values from a field using a <regex-expression>

The following example extracts the first digit and the second digit from body into field0 and field1 respectively.

... | rex field=body "(?<field0>\\d+),(?<field1>\\d+)" | ...;

Incoming record

Record {
 body = "1,100"
 time = 123456789
}

Outgoing record:

Record {
 body = "1,100"
 time = 123456789,
 field0 = 1
 field1 = 100
}

2. Use a <sed-expression>

The following example uses a <sed-expression> to match the regex to a series of numbers and replace the numbers with the string "XXX". In this example the credit card number is anonymized.

... | rex field=body mode=sed "s/,\\d+,/,XXX,/g" | ...;

Incoming record

Record {
 body = "1,4222222222222,credit"
 timestamp = 123456789
}

Outgoing record

Record {
 body = "1,XXX,credit"
 timestamp = 123456789
}

3. Use sed to replace a particular position in the string field

The following example uses sed to replace a particular string at a fixed position using 0-based indexing. The 5th character in the body field, or position 4 using 0-based indexing, should be replaced with "X" if it matches the \d regular expression.

... | rex field=body mode=sed "s/\\d/X/4" |...;

Incoming record:

Record{
body="2,102"
}

Outgoing record:

Record{
body="2,10X"
}

4. Limit the number of matches with max_match

In the following examples, the max_match optional argument is invoked to limit the number of matches. If max_match > 1 or equal to 0 (unlimited), then it will create a multivalued (list) field in the outgoing record. The first example shows what would happen if max_match is set to 0. The second example shows what would happen if max_match is set to 1. The third example shows what would happen if max_match is set to 2.

... | rex max_match=0 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;

... | rex max_match=1 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;

... | rex max_match=2 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;

Incoming record

Record{
body="0,100,1,101,2,102"
}

Outgoing record with max_match=0

Record{
body="0,100,1,101,2,102" 
field0=["0", "1", "2"]
field1=["100", "101", "102"]
}

Outgoing record with max_match=1

Record{
body="0,100,1,101,2,102" 
field0="0"
field1="100"
}

Outgoing record with max_match=2

Record{
body="0,100,1,101,2,102", 
field0=["0", "1"], 
field1=["100", "101"]
}

5. Using offset_field to track which parts of the body field were extracted

The following example uses the offset_field argument to track which parts of the body field were extracted. In this example, the position of the extracted values are returned in a top-level field called position. The first character is in position 0.

... | rex offset_field=position field=body "(?<field0>\\d+)(,)(?<field1>\\d+)" |...;

Incoming records:

Record{
body="0,100"
}

Outgoing records:

Record{
body="0,100"
position="field0=0-0&field1=2-4", 
field0="0",
 field1="100"
}

6. Using offset_field to track which parts of the body field were extracted with unlimited matches

The following example uses the offset_field to track which parts of the body field were extracted and also changes the max_match field to support unlimited matches.

... | rex max_match=0 offset_field=position field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;

Incoming record:

Record{
body="8,108,9,109"
}

Outgoing record:

Record{
body="8,108,9,109",
position="field0=0-0&field1=2-4&field0=6-6&field1=8-10",
field0=["8", "9"], 
field1=["108", "109"]
}

Related answers from Splunk Community

Rex

Description

Function Input/Output Schema

Syntax

Required arguments

Optional arguments

Usage

Regular expressions

Using a sed expression

SPL2 examples

1. Extract values from a field using a <regex-expression>

2. Use a <sed-expression>

3. Use sed to replace a particular position in the string field

4. Limit the number of matches with max_match

5. Using offset_field to track which parts of the body field were extracted

6. Using offset_field to track which parts of the body field were extracted with unlimited matches

Comments

Rex

Was this topic useful?