Parse regex (rex)
Extract or rename fields using regular expression named capture groups, or edit fields using a sed expression.
The rex
function matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.
When mode=sed
, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax can also be used to mask sensitive data.
- Function Input
collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
collection<record<S>>
- This function outputs the same collection of records but with a different schema S.
Arguments
Argument | Input | Description | UI example |
---|---|---|---|
field | string | The field that you want to extract information from. | body |
pattern | string | The Java regular expression that defines the information to match and extract from the specified field. | /[(?<timestamp>\d+)].*/ |
max_match | int | Optional. Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields. Use 0 for unlimited matches. Defaults to 1. | 10 |
offset_field | string | Optional. If provided, a field is created with the name specified by <string>. This value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is (?<tenchars>.{10}) , this matches the first ten characters of the field, and the offset_field contents is 0-9 .
|
newofield |
mode | string | Specify to indicate that you are using a sed (UNIX stream editor) expression. | sed |
Using a sed expression
When using the rex function in sed mode, you have two options: replace (s) or character substitution (y).
The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"
- <regex> is a Java regular expression, which can include capturing groups.
- <replacement> is a string to replace the regex match. Use
\n
for backreferences, where "n" is a single digit. - <flags> can be either: g to replace all matches, or a number to replace a specified match.
The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"
- This substitutes the characters that match <string1> with the characters in <string2>.
DSL example
This example extracts email values using regular expressions:
rex(events, "messages", "From: (?<from>.*) To: (?<to>.*)", 10, "newofield", null);
This example uses a <sed-expression>:
rex(events, "ccnumber", "s/(d{4}-){3}/XXXX-XXXX-XXXX-/g", null, null, "sed");
Parse delimited | Rename |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.1
Feedback submitted, thanks!