Splunk® Data Stream Processor

DSP Function Reference

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Parse regex (rex)

Extract or rename fields using regular expression named capture groups, or edit fields using a sed expression. For general information on regular expressions, see About Splunk Data Stream Processor regular expressions in the DSP User Manual.

The rex command matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.

When mode=sed, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax can also be used to mask sensitive data.

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs the same collection of records but with a different schema S.

Arguments

Argument Input Description UI example
field string The field that you want to extract information from. body
pattern string The Java regular expression that defines the information to match and extract from the specified field. /[(?<timestamp>\d+)].*/
max_match integer Optional. Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields. Use 0 for unlimited matches. Defaults to 1. 10
offset_field string Optional. If provided, a field is created with the name specified by <string>. This value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is (?<tenchars>.{10}), this matches the first ten characters of the field, and the offset_field contents is 0-9. newofield
mode string Specify to indicate that you are using a sed (UNIX stream editor) expression. sed

Using a sed expression

When using the rex command in sed mode, you have two options: replace (s) or character substitution (y).

The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"

  • <regex> is a Java regular expression, which can include capturing groups.
  • <replacement> is a string to replace the regex match. Use \n for backreferences, where "n" is a single digit.
  • <flags> can be either: g to replace all matches, or a number to replace a specified match.

The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"

  • This substitutes the characters that match <string1> with the characters in <string2>.

DSL example

This example extracts email values using regular expressions:

rex(events, "messages", "From: (?<from>.*) To: (?<to>.*)", 10, "newofield", null);

This example uses a <sed-expression>:

rex(events, "ccnumber", "s/(d{4}-){3}/XXXX-XXXX-XXXX-/g", null, null, "sed");
Last modified on 04 February, 2020
PREVIOUS
Parse delimited
  NEXT
Rename

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters