On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
Rex
This topic describes how to use the function in the .
Description
Extract or rename fields using regular expression named capture groups, or edit fields using a sed expression.
The rex
function matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.
When mode=sed
, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax can also be used to mask sensitive data.
For more information about regular expressions in the , see about regular expressions.
Function Input/Output Schema
- Function Input
- collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
- collection<record<S>>
- This function outputs the same collection of records but with a different schema S.
Syntax
The required fields are in bold.
If using regex:
- rex
- field=<field>
- pattern=<regex-expression> [max_match=<int>] [offset_field=<string>]
If using sed:
- rex
- field=<field>
- mode=sed <sed-expression>
You must specify either <pattern>
or mode=sed <sed-expression>
when you use the rex
function.
Required arguments
- field
- Syntax: field=<field>
- Description: The field that you want to extract information from.
- Example in Canvas View: body
- pattern
- Syntax: regex string
- Description: The Java regular expression (regex) or sed expression that defines the information to match and extract from the specified field. You must include a named capturing group in a regular expression pattern surrounded by forward slashes ( / ). If a match cannot be found, the new field is still added to your records but the value is set to
null
. Capturing groups can only contain alphanumeric characters. If you are using a sed expression, you must setmode=sed
. - Example in Canvas View: /[(?<timestamp>\d+)].*/
Optional arguments
- mode
- Syntax: string
- Description: Only required when you want to use a sed (UNIX stream editor) expression. Specify to indicate that you are using a sed expression.
- Example in Canvas View: sed
- max_match
- Syntax: int
- Description: Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields.
- Default: 1. Use 0 for unlimited matches.
- Example in Canvas View: 10
- offset_field
- Syntax: string
- Description: The desired output field name. You can specify it either without quotes or with single quotes, such as
offset_field=myfield
oroffset_field='myfield'
. It is safer to use single quotes to avoid conflicts with reserved keywords in SPL such asoffset
. If you wanted to use a reserved keyword as a field name, you need to enclose that field name with single quotes, for example:offset_field='offset'
. The value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is(?<tenchars>.{10})
, this matches the first ten characters of the field, and the offset_field contents is0-9
. - Example in Canvas View: position
Usage
This section contains additional usage information about the Rex function.
Regular expressions
Unlike Splunk Enterprise, regular expressions used in the are Java regular expressions.
Using a sed expression
When using the rex function in sed mode, you have two options: replace (s) or character substitution (y).
The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"
- <regex> is a Java regular expression, which can include capturing groups.
- <replacement> is a string to replace the regex match. Use
\n
for backreferences, where "n" is a single digit. - <flags> can be either: g to replace all matches, or a number to replace a specified match.
The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"
- This substitutes the characters that match <string1> with the characters in <string2>.
SPL2 examples
Examples of common use cases follow. The following examples in this section assume that you are in the SPL View.
When working in the SPL View you can write the function by providing the arguments in the exact order shown in each use case.
If you are using the SPL2 Pipeline Builder, you must escape any backslash ( \ ) characters. If you are using the Canvas Builder, backslash characters are automatically escaped. See Using regular expressions in the Canvas Builder vs the SPL2 Pipeline Builder.
1. Extract values from a field using a <regex-expression>
The following example extracts the first digit and the second digit from body into field0
and field1
respectively.
... | rex field=body "(?<field0>\\d+),(?<field1>\\d+)" | ...;
Incoming record
Record { body = "1,100" time = 123456789 }
Outgoing record:
Record { body = "1,100" time = 123456789, field0 = 1 field1 = 100 }
2. Use a <sed-expression>
The following example uses a <sed-expression> to match the regex to a series of numbers and replace the numbers with the string "XXX". In this example the credit card number is anonymized.
... | rex field=body mode=sed "s/,\\d+,/,XXX,/g" | ...;
Incoming record
Record { body = "1,4222222222222,credit" timestamp = 123456789 }
Outgoing record
Record { body = "1,XXX,credit" timestamp = 123456789 }
3. Use sed to replace a particular position in the string field
The following example uses sed to replace a particular string at a fixed position using 0-based indexing. The 5th character in the body field, or position 4 using 0-based indexing, should be replaced with "X" if it matches the \d
regular expression.
... | rex field=body mode=sed "s/\\d/X/4" |...;
Incoming record:
Record{ body="2,102" }
Outgoing record:
Record{ body="2,10X" }
4. Limit the number of matches with max_match
In the following examples, the max_match
optional argument is invoked to limit the number of matches. If max_match
> 1 or equal to 0 (unlimited), then it will create a multivalued (list) field in the outgoing record. The first example shows what would happen if max_match
is set to 0. The second example shows what would happen if max_match
is set to 1. The third example shows what would happen if max_match
is set to 2.
... | rex max_match=0 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;
... | rex max_match=1 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;
... | rex max_match=2 field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;
Incoming record
Record{ body="0,100,1,101,2,102" }
Outgoing record with max_match
=0
Record{ body="0,100,1,101,2,102" field0=["0", "1", "2"] field1=["100", "101", "102"] }
Outgoing record with max_match
=1
Record{ body="0,100,1,101,2,102" field0="0" field1="100" }
Outgoing record with max_match
=2
Record{ body="0,100,1,101,2,102", field0=["0", "1"], field1=["100", "101"] }
5. Using offset_field to track which parts of the body field were extracted
The following example uses the offset_field
argument to track which parts of the body field were extracted. In this example, the position of the extracted values are returned in a top-level field called position
. The first character is in position 0.
... | rex offset_field=position field=body "(?<field0>\\d+)(,)(?<field1>\\d+)" |...;
Incoming records:
Record{ body="0,100" }
Outgoing records:
Record{ body="0,100" position="field0=0-0&field1=2-4", field0="0", field1="100" }
6. Using offset_field to track which parts of the body field were extracted with unlimited matches
The following example uses the offset_field
to track which parts of the body field were extracted and also changes the max_match
field to support unlimited matches.
... | rex max_match=0 offset_field=position field=body "(?<field0>\\d+),(?<field1>\\d+)" |...;
Incoming record:
Record{ body="8,108,9,109" }
Outgoing record:
Record{ body="8,108,9,109", position="field0=0-0&field1=2-4&field0=6-6&field1=8-10", field0=["8", "9"], field1=["108", "109"] }
Pairwise Categorical Outlier Detection (beta) | Rename |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0, 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!