Splunk® Enterprise

Search Reference

Download manual as PDF

Download topic as PDF

rex

Description

Use this command to either extract fields using regular expression named groups, or replace or substitute characters in a field using sed expressions.

The rex command matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.

When mode=sed, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax is also used to mask sensitive data at index-time. Read about using sed to anonymize data in the Getting Data In Manual.

If a field is not specified, the regular expression or sed expression is applied to the _raw field. Running the rex command against the _raw field might have a performance impact.

Use the rex command for search-time field extraction or string replacement and character substitution.

Syntax

rex [field=<field>] ( <regex-expression> [max_match=<int>] [offset_field=<string>] ) | (mode=sed <sed-expression>)

Required arguments

You must specify either <regex-expression> or mode=sed <sed-expression>.

regex-expression
Syntax: "<string>"
Description: The PCRE regular expression that defines the information to match and extract from the specified field. Quotation marks are required.
mode
Syntax: mode=sed
Description: Specify to indicate that you are using a sed (UNIX stream editor) expression.
sed-expression
Syntax: "<string>"
Description: When mode=sed, specify whether to replace strings (s) or substitute characters (y) in the matching regular expression. No other sed commands are implemented. Quotation marks are required. Sed mode supports the following flags: global (g) and Nth occurrence (N), where N is a number that is the character location in the string.

Optional arguments

field
Syntax: field=<field>
Description: The field that you want to extract information from.
Default: _raw
max_match
Syntax: max_match=<int>
Description: Controls the number of times the regex is matched. If greater than 1, the resulting fields are multivalued fields.
Default: 1, use 0 to mean unlimited.
offset_field
Syntax: offset_field=<string>
Description: If provided, a field is created with the name specified by <string>. This value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is "(?<tenchars>.{10})", this matches the first ten characters of the field, and the offset_field contents is "0-9".
Default: unset

Sed expression

When using the rex command in sed mode, you have two options: replace (s) or character substitution (y).

The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"

  • <regex> is a PCRE regular expression, which can include capturing groups.
  • <replacement> is a string to replace the regex match. Use \n for backreferences, where "n" is a single digit.
  • <flags> can be either: g to replace all matches, or a number to replace a specified match.

The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"

  • This substitutes the characters that match <string1> with the characters in <string2>.

Usage

The rex command is a distributable streaming command. See Command types.

Use the rex command to either extract fields using regular expression named groups, or replace or substitute characters in a field using sed expressions. Use the regex command to remove results that do not match the specified regular expression.

Splunk SPL uses perl-compatible regular expressions (PCRE).

When you use regular expressions in searches, you need to be aware of how characters such as pipe ( | ) and backslash ( \ ) are handled. See SPL and regular expressions in the Search Manual.

For general information about regular expressions, see Splunk Enterprise regular expressions in the Knowledge Manager Manual.

Examples

1. Extract email values using regular expressions

Extract email values from events to create from and to fields in your events. For example, you have events such as:

Mon Mar 19 20:16:27 2018 Info: Bounced: DCID 8413617 MID 19338947 From: <MariaDubois@example.com> To: <zecora@buttercupgames.com> RID 0 - 5.4.7 - Delivery expired (message too old) ('000', ['timeout']) 

Mon Mar 19 20:16:03 2018 Info: Delayed: DCID 8414309 MID 19410908 From: <WeiZhang@example.com> To: <mcintosh@buttercupgames.com> RID 0 - 4.3.2 - Not accepting messages at this time ('421', ['4.3.2 try again later']) 

Mon Mar 19 20:16:02 2018 Info: Bounced: DCID 0 MID 19408690 From: <Exit_Desk@sample.net> To: <lyra@buttercupgames.com> RID 0 - 5.1.2 - Bad destination host ('000', ['DNS Hard Error looking up mahidnrasatyambsg.com (MX):  NXDomain']) 

Mon Mar 19 20:15:53 2018 Info: Delayed: DCID 8414166 MID 19410657 From: <Manish_Das@example.com> To: <dash@buttercupgames.com> RID 0 - 4.3.2 - Not accepting messages at this time ('421', ['4.3.2 try again later']) 

When the events were indexed, the From and To values were not identified as fields. You can use the rex command to extract the field values and create from and to fields in your search results.

The from and to lines in the _raw events follow an identical pattern. Each from line is From: and each to line is To:. The email addresses are enclosed in angle brackets. You can use this pattern to create a regular expression to extract the values and create the fields.

source="cisco_esa.txt" | rex field=_raw "From: <(?<from>.*)> To: <(?<to>.*)>"

You can remove duplicate values and return only the list of address by adding the dedup and table commands to the search.

source="cisco_esa.txt" | rex field=_raw "From: <(?<from>.*)> To: <(?<to>.*)>" | dedup from to | table from to

The results look something like this: This image shows the results of the search. There are two columns, from and to, that display email addresses.

Example 2:

Extract "user", "app" and "SavedSearchName" from a field called "savedsearch_id" in scheduler.log events. If savedsearch_id=bob;search;my_saved_search then user=bob , app=search and SavedSearchName=my_saved_search

... | rex field=savedsearch_id "(?<user>\w+);(?<app>\w+);(?<SavedSearchName>\w+)"

Example 3:

Use sed syntax to match the regex to a series of numbers and replace them with an anonymized string.

... | rex field=ccnumber mode=sed "s/(\d{4}-){3}/XXXX-XXXX-XXXX-/g"

Example 4:

Display IP address and ports of potential attackers.

sourcetype=linux_secure port "failed password" | rex "\s+(?<ports>port \d+)" | top src_ip ports showperc=0

This search used rex to extract the port field and values. Then, it displays a table of the top source IP addresses (src_ip) and ports the returned with the search for potential attackers.

See also

extract, kvform, multikv, regex, spath, xmlkv

PREVIOUS
reverse
  NEXT
rtorder

This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.3.0, 7.3.1, 7.3.2


Comments

Hello Mjcherb
Thank you for sharing this about the (?ix) flag ... which expands to the "gmxi" flags, according to https://regex101.com/. And also your insight about the advantage of having strings containing regular expressions that set their own flags. I'm sure other Splunk users will find this information valuable!

Lstewart splunk, Splunker
September 27, 2019

your cisco_esa example search might be rewritten as a multi-line query:

source="cisco_esa.txt"
| rex field=_raw "(?ix)
from:\s<(?<from>.*)>\s
to:\s<(?<to>.*)>
"
| dedup from to
| table from to

... granted this example is a little contrived, but is at least complex enough to show what I meant. note that the string is split across multiple lines and is intended for legibility - in order to actually match spaces, I need to use \s now. It's redundant, but I've also set it to be case-insensitive[1] and modified the field names to show that too.

why might you do this? well, the advantage of having strings containing regular expressions that set their own flags is that you can then include them in other strings to build more complex expressions and with a bit of care, each will be usable in those other contexts without requiring modification.

[1] I can't recall whether rex matches case or not by default - this is at least more explicit

Mjcherb
September 18, 2019

a note on how to use in-string flags to allow for multi-line regular expressions could also be very very useful - (?ix) at the beginning of the string tells rex to ignore case and ignore whitespace in the string. This allows you to then split your regular expression across multiple lines in your query to make them easier to develop ... the drawback is that you then MUST use \n, \r, \s, or \t in order to match whitespace

Mjcherb
September 18, 2019

Yudong
I used https://regex101.com/ to test your regex. It indicated that you need to escape your forward slashes, because they are being read as a delimiter. So the correct regex expression is

?P<result>\/search_me[^\/]*)\/

Lstewart splunk, Splunker
April 8, 2019

Rey123

Thank you for taking the time to leave feedback on the REX command and specifically about including more examples. We are always looking for ways to make using the documentation easier and your comments help us focus those efforts.

For the "max_match" argument, by default the command returns the first match it finds. If you want the command to return multiple matches from the same event, you can set max_match to a higher value. If you specify "max_match=0", it will return all matches that it finds in each event.

If you have an example of data where this would be useful, would you mind sharing that with us? We are always looking for real world examples to add to the documentation. People use the commands for so many different purposes that having a variety of examples is helpful.

Lstewart splunk, Splunker
April 8, 2019

when you have multiple matches in one event, it returns <null>, is this desired?
"some text /search_me-1/ something /search_me-2/ something else"
| rex "(?P<result>/search_me[^\/]*)/" | table result
splunk version: 6.5.2

Yudong
April 5, 2019

The syntax above uses 'max_match', however, none of the examples that follow demonstrate usage of this optional argument. It would additionally have been great if an example could have been given using ALL/ as many of the 'rex' command's parameters as possible, to illustrate how ALL of these arguments would like in a real-world usecase. If such examples were intentionally left out, sufficient links (such as this: https://docs.splunk.com/Documentation/Splunk/7.2.0/Knowledge/Addfieldmatchingrulestoyourlookupconfiguration) should have been included for the user to easily navigate and find the desired explanation. But there too, there are no examples. This only bears out the general experience of using Splunk documentation - it is tedious to look for examples and/ or clearly-explained usages of Splunk keywords/ commands. The documentation is not exhaustive or succinct and requires a user to traverse & search in SEVERAL pages to glean the necessary (& usable) information. Thank you

Rey123
April 1, 2019

Pyamamoto yes you can do this (see below). I'd recommend https://answers.splunk.com for more visibility into the question.

(?<host_type>(?:(?!-\d+).)*)

Clintrajaniemi
February 12, 2019

Is there a way to have the sed result be assigned to a new field name instead of modifying the specified field? Example where a "host" field contains a "host-type" naming scheme prefix followed by an instance number eg "my-host-type-1234.company.com" | rex field=host mode=sed 's/-1234.*$//' would modify host but I don't want host modified, I want to make it host-type... note that a regex to match the prefix is a bit more involved due to the "-xxxx" and not wanting a trailing -, for example rex field=host "(?<host-type>*[^0-9]*)[0-9]" is almost there but keeps a trailing "-".

Pyamamoto
February 4, 2019

Reneedeleon - Thanks for reaching out to us with your question. Splunk does not have videos or books specifically aboutcreating regular expressions. However we do have several topics about them here:

https://docs.splunk.com/Documentation/Splunk/7.2.0/Knowledge/AboutSplunkregularexpressions
https://docs.splunk.com/Documentation/Splunk/7.2.0/Search/SPLandregularexpressions

There are a number of online resources, such as www.regular-expressions.info
and there are books such as Mastering Regular Expressions by O’Rielly http://shop.oreilly.com/product/9780596528126.do

Additionally, you can google for “regex videos” for a list of videos.

Hope this helps!

Lstewart splunk, Splunker
October 5, 2018

Is there, or are the video tutorials or books that can help with people who have trouble generating custom rex and/or regex strings?

Reneedeleon
October 3, 2018

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters