
rex
Description
Use this command to either extract fields using regular expression named groups, or replace or substitute characters in a field using sed expressions.
The rex
command matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.
When mode=sed
, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax is also used to mask sensitive data at index-time. Read about using sed to anonymize data in the Getting Data In Manual.
If a field is not specified, the regular expression or sed expression is applied to the _raw
field. Running the rex
command against the _raw
field might have a performance impact.
Use the rex
command for search-time field extraction or string replacement and character substitution.
Syntax
The required syntax is in bold.
- rex [field=<field>]
- ( <regex-expression> [max_match=<int>] [offset_field=<string>] ) | (mode=sed <sed-expression>)
Required arguments
You must specify either <regex-expression> or mode=sed <sed-expression>.
- regex-expression
- Syntax: "<string>"
- Description: The PCRE regular expression that defines the information to match and extract from the specified field. Quotation marks are required.
- mode
- Syntax: mode=sed
- Description: Specify to indicate that you are using a sed (UNIX stream editor) expression.
- sed-expression
- Syntax: "<string>"
- Description: When mode=sed, specify whether to replace strings (s) or substitute characters (y) in the matching regular expression. No other sed commands are implemented. Quotation marks are required. Sed mode supports the following flags: global (g) and Nth occurrence (N), where N is a number that is the character location in the string.
Optional arguments
- field
- Syntax: field=<field>
- Description: The field that you want to extract information from.
- Default:
_raw
- max_match
- Syntax: max_match=<int>
- Description: Controls the number of times the regex is matched. If greater than 1, the resulting fields are multivalued fields. Use 0 to specify unlimited matches. Multiple matches apply to the repeated application of the whole pattern. If your regex contains a capture group that can match multiple times within your pattern, only the last capture group is used for multiple matches.
- Default: 1
- offset_field
- Syntax: offset_field=<string>
- Description: If provided, a field is created with the name specified by
<string>
. This value of the field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if therex
expression is "(?<tenchars>.{10})", this matches the first ten characters of the field, and the offset_field contents is "0-9". - Default: No default
Usage
The rex
command is a distributable streaming command. See Command types.
rex command or regex command?
Use the rex
command to either extract fields using regular expression named groups, or replace or substitute characters in a field using sed expressions.
Use the regex
command to remove results that do not match the specified regular expression.
Regular expressions
Splunk SPL uses perl-compatible regular expressions (PCRE).
When you use regular expressions in searches, you need to be aware of how characters such as pipe ( | ) and backslash ( \ ) are handled. See SPL and regular expressions in the Search Manual.
For general information about regular expressions, see Splunk Enterprise regular expressions in the Knowledge Manager Manual.
Sed expressions
When using the rex
command in sed mode, you have two options: replace (s) or character substitution (y).
The syntax for using sed to replace (s) text in your data is: "s/<regex>/<replacement>/<flags>"
- <regex> is a PCRE regular expression, which can include capturing groups.
- <replacement> is a string to replace the regex match. Use
\n
for back references, where "n" is a single digit. - <flags> can be either
g
to replace all matches, or a number to replace a specified match.
The syntax for using sed to substitute characters is: "y/<string1>/<string2>/"
- This substitutes the characters that match <string1> with the characters in <string2>.
Examples
1. Extract email values using regular expressions
Extract email values from events to create from
and to
fields in your events. For example, you have events such as:
Mon Mar 19 20:16:27 2018 Info: Bounced: DCID 8413617 MID 19338947 From: <MariaDubois@example.com> To: <zecora@buttercupgames.com> RID 0 - 5.4.7 - Delivery expired (message too old) ('000', ['timeout']) Mon Mar 19 20:16:03 2018 Info: Delayed: DCID 8414309 MID 19410908 From: <WeiZhang@example.com> To: <mcintosh@buttercupgames.com> RID 0 - 4.3.2 - Not accepting messages at this time ('421', ['4.3.2 try again later']) Mon Mar 19 20:16:02 2018 Info: Bounced: DCID 0 MID 19408690 From: <Exit_Desk@sample.net> To: <lyra@buttercupgames.com> RID 0 - 5.1.2 - Bad destination host ('000', ['DNS Hard Error looking up mahidnrasatyambsg.com (MX): NXDomain']) Mon Mar 19 20:15:53 2018 Info: Delayed: DCID 8414166 MID 19410657 From: <Manish_Das@example.com> To: <dash@buttercupgames.com> RID 0 - 4.3.2 - Not accepting messages at this time ('421', ['4.3.2 try again later'])
When the events were indexed, the From and To values were not identified as fields. You can use the rex
command to extract the field values and create from
and to
fields in your search results.
The from and to lines in the _raw events follow an identical pattern. Each from line is From: and each to line is To:. The email addresses are enclosed in angle brackets. You can use this pattern to create a regular expression to extract the values and create the fields.
source="cisco_esa.txt" | rex field=_raw "From: <(?<from>.*)> To: <(?<to>.*)>"
You can remove duplicate values and return only the list of address by adding the dedup
and table
commands to the search.
source="cisco_esa.txt" | rex field=_raw "From: <(?<from>.*)> To: <(?<to>.*)>" | dedup from to | table from to
The results look something like this:
2. Extract from multi-valued fields using max_match
You can use the max_match
argument to specify that the regular expression runs multiple times to extract multiple values from a field.
For example, use the makeresults
command to create a field with multiple values:
| makeresults
| eval test="a$1,b$2"
The results look something like this:
_time | test |
---|---|
2019-12-05 11:15:28 | a$1,b$2 |
To extract each of the values in the test
field separately, you use the max_match
argument with the rex
command. For example:
...| rex field=test max_match=0 "((?<field>[^$]*)\$(?<value>[^,]*),?)"
The results look something like this:
_time | field | test | value |
---|---|---|---|
2019-12-05 11:36:57 | a
b |
a$1,b$2 | 1
2 |
3. Extract values from a field in scheduler.log events
Extract "user", "app" and "SavedSearchName" from a field called "savedsearch_id" in scheduler.log events. If savedsearch_id=bob;search;my_saved_search
then user=bob
, app=search
and SavedSearchName=my_saved_search
... | rex field=savedsearch_id "(?<user>\w+);(?<app>\w+);(?<SavedSearchName>\w+)"
4. Use a sed expression
Use sed
syntax to match the regex to a series of numbers and replace them with an anonymized string.
... | rex field=ccnumber mode=sed "s/(\d{4}-){3}/XXXX-XXXX-XXXX-/g"
5. Display IP address and ports of potential attackers
Display IP address and ports of potential attackers.
sourcetype=linux_secure port "failed password" | rex "\s+(?<ports>port \d+)" | top src_ip ports showperc=0
This search used rex to extract the port field and values. Then, it displays a table of the top source IP addresses (src_ip) and ports the returned with the search for potential attackers.
See also
PREVIOUS reverse |
NEXT rtorder |
This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 8.0.0
Comments
your cisco_esa example search might be rewritten as a multi-line query:
source="cisco_esa.txt"
| rex field=_raw "(?ix)
from:\s<(?<from>.*)>\s
to:\s<(?<to>.*)>
"
| dedup from to
| table from to
... granted this example is a little contrived, but is at least complex enough to show what I meant. note that the string is split across multiple lines and is intended for legibility - in order to actually match spaces, I need to use \s now. It's redundant, but I've also set it to be case-insensitive[1] and modified the field names to show that too.
why might you do this? well, the advantage of having strings containing regular expressions that set their own flags is that you can then include them in other strings to build more complex expressions and with a bit of care, each will be usable in those other contexts without requiring modification.
[1] I can't recall whether rex matches case or not by default - this is at least more explicit
a note on how to use in-string flags to allow for multi-line regular expressions could also be very very useful - (?ix) at the beginning of the string tells rex to ignore case and ignore whitespace in the string. This allows you to then split your regular expression across multiple lines in your query to make them easier to develop ... the drawback is that you then MUST use \n, \r, \s, or \t in order to match whitespace
Yudong
I used https://regex101.com/ to test your regex. It indicated that you need to escape your forward slashes, because they are being read as a delimiter. So the correct regex expression is
?P<result>\/search_me[^\/]*)\/
Rey123
Thank you for taking the time to leave feedback on the REX command and specifically about including more examples. We are always looking for ways to make using the documentation easier and your comments help us focus those efforts.
For the "max_match" argument, by default the command returns the first match it finds. If you want the command to return multiple matches from the same event, you can set max_match to a higher value. If you specify "max_match=0", it will return all matches that it finds in each event.
If you have an example of data where this would be useful, would you mind sharing that with us? We are always looking for real world examples to add to the documentation. People use the commands for so many different purposes that having a variety of examples is helpful.
when you have multiple matches in one event, it returns <null>, is this desired?
"some text /search_me-1/ something /search_me-2/ something else"
| rex "(?P<result>/search_me[^\/]*)/" | table result
splunk version: 6.5.2
The syntax above uses 'max_match', however, none of the examples that follow demonstrate usage of this optional argument. It would additionally have been great if an example could have been given using ALL/ as many of the 'rex' command's parameters as possible, to illustrate how ALL of these arguments would like in a real-world usecase. If such examples were intentionally left out, sufficient links (such as this: https://docs.splunk.com/Documentation/Splunk/7.2.0/Knowledge/Addfieldmatchingrulestoyourlookupconfiguration) should have been included for the user to easily navigate and find the desired explanation. But there too, there are no examples. This only bears out the general experience of using Splunk documentation - it is tedious to look for examples and/ or clearly-explained usages of Splunk keywords/ commands. The documentation is not exhaustive or succinct and requires a user to traverse & search in SEVERAL pages to glean the necessary (& usable) information. Thank you
Pyamamoto yes you can do this (see below). I'd recommend https://answers.splunk.com for more visibility into the question.
(?<host_type>(?:(?!-\d+).)*)
Is there a way to have the sed result be assigned to a new field name instead of modifying the specified field? Example where a "host" field contains a "host-type" naming scheme prefix followed by an instance number eg "my-host-type-1234.company.com" | rex field=host mode=sed 's/-1234.*$//' would modify host but I don't want host modified, I want to make it host-type... note that a regex to match the prefix is a bit more involved due to the "-xxxx" and not wanting a trailing -, for example rex field=host "(?<host-type>*[^0-9]*)[0-9]" is almost there but keeps a trailing "-".
Reneedeleon - Thanks for reaching out to us with your question. Splunk does not have videos or books specifically aboutcreating regular expressions. However we do have several topics about them here:
https://docs.splunk.com/Documentation/Splunk/7.2.0/Knowledge/AboutSplunkregularexpressions
https://docs.splunk.com/Documentation/Splunk/7.2.0/Search/SPLandregularexpressions
There are a number of online resources, such as www.regular-expressions.info
and there are books such as Mastering Regular Expressions by O’Rielly http://shop.oreilly.com/product/9780596528126.do
Additionally, you can google for “regex videos” for a list of videos.
Hope this helps!
Is there, or are the video tutorials or books that can help with people who have trouble generating custom rex and/or regex strings?
Hello Mjcherb
Thank you for sharing this about the (?ix) flag ... which expands to the "gmxi" flags, according to https://regex101.com/. And also your insight about the advantage of having strings containing regular expressions that set their own flags. I'm sure other Splunk users will find this information valuable!