Parse regex (rex)
Extract or rename fields using regular expression named capture groups, or edit fields using a sed expression. For general information on regular expressions, see About Splunk Data Stream Processor regular expressions in the DSP User Manual.
The rex
command matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.
When mode=sed
, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax can also be used to mask sensitive data.
- Function Input
- collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
- collection<record<S>>
- This function outputs the same collection of records but with a different schema S.
Arguments
Argument | Input | Description | UI example |
---|---|---|---|
field | string | The field that you want to extract information from. | body |
pattern | string | The Java regular expression that defines the information to match and extract from the specified field. | /[(?<timestamp>\d+)].*/ |
max_match | integer | Optional. Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields. Use 0 for unlimited matches. Defaults to 1. | 10 |
offset_field | string | Optional. If provided, a field is created with the name specified by <string>. This value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is (?<tenchars>.{10}) , this matches the first ten characters of the field, and the offset_field contents is 0-9 .
|
newofield |
mode | string | Specify to indicate that you are using a sed (UNIX stream editor) expression. | sed |
Using a sed expression
When using the rex command in sed mode, you have two options: replace (s) or character substitution (y).
The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"
- <regex> is a Java regular expression, which can include capturing groups.
- <replacement> is a string to replace the regex match. Use
\n
for backreferences, where "n" is a single digit. - <flags> can be either: g to replace all matches, or a number to replace a specified match.
The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"
- This substitutes the characters that match <string1> with the characters in <string2>.
DSL example
This example extracts email values using regular expressions:
rex(events, "messages", "From: (?<from>.*) To: (?<to>.*)", 10, "newofield", null);
This example uses a <sed-expression>:
rex(events, "ccnumber", "s/(d{4}-){3}/XXXX-XXXX-XXXX-/g", null, null, "sed");
Parse delimited | Rename |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0
Feedback submitted, thanks!