
regex
Description
The regex command removes results that do not match the specified regular expression.
Syntax
regex (<field>=<regex-expression> | <field>!=<regex-expression> | <regex-expression>)
Required arguments
- <regex-expression>
- Syntax: "<string>"
- Description: An unanchored regular expression. The regular expression must be a Perl Compatible Regular Expression supported by the PCRE library. Quotation marks are required.
Optional arguments
- <field>
- Syntax: <field>
- Description: Specify the field name from which to match the values against the regular expression.
- You can specify that the
regex
command keeps results that match the expression by using <field>=<regex-expression>. To keep results that do not match, specify <field>!=<regex-expression>. - Default:
_raw
Usage
Use the regex
command to remove results that do not match the specified regular expression.
Use the rex
command to either extract fields using regular expression named groups, or replace or substitute characters in a field using sed expressions.
When you use regular expressions in searches, you need to be aware of how characters such as pipe ( | ) and backslash ( \ ) are handled. See SPL and regular expressions in the Search Manual.
For general information about regular expressions, see About Splunk regular expressions in the Knowledge Manager Manual.
Examples
Example 1: Keep only search results whose "_raw" field contains IP addresses in the non-routable class A (10.0.0.0/8). This example uses a negative lookbehind assertion at the beginning of the expression.
... | regex _raw="(?<!\d)10\.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\d)"
Example 2: Keep only the results that match a valid email address. For example, buttercup@example.com.
...| regex email="/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/"
The following table explains each part of the expression.
Part of the expression | Description |
---|---|
/^
|
Specifies to start at the beginning of the string. |
([a-z0-9_\.-]+)
|
This is the first group in the expression. Specifies to match one or more lowercase letters, numbers, underscores, dots, or hyphens. The backslash ( \ ) character is used to escape the dot ( . ) character. The dot character is escaped, because a non-escaped dot matches any character. The plus ( + ) sign specifies to match from 1 to unlimited characters in this group. In this example this part of the expression matches buttercup in the email address buttercup@example.com. |
@
|
Matches the at symbol. |
([\da-z\.-]+)
|
This is the second group in the expression. Specifies to match the domain name, which can be one or more lowercase letters, numbers, underscores, dots, or hyphens. This is followed by another escaped dot character. The plus ( + ) sign specifies to match from 1 to unlimited characters in this group. In this example this part of the expression matches example in the email address buttercup@example.com. |
([a-z\.]{2,6})
|
This is the third group. Specifies to match the top-level domain (TLD), which can be 2 to 6 letters or dots. This group matches all types of TLDs, such as .co.uk , .edu , or .asia . In this example it matches .com in the email address buttercup@example.com.
|
See also
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the regex command.
PREVIOUS rare |
NEXT relevancy |
This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18, 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15, 6.3.1, 4.3.1, 6.3.0, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 6.3.10, 6.3.11
Comments
In example 1, I believe the first part of the regex should be (negative lookbehind of a digit)
(?>!\d)
and not
(?=!\d)
The current regex means positive lookahead for an exclamation mark followed by a digit.
Is there a way to do KVP extraction inl-lined ("(?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)") like it shows here:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureindex-timefieldextraction#Add_a_regex_stanza_for_the_new_field_to_transforms.conf
In example 1 what does (?=!\d) this refer to?
In example 2 what does regex(?=expression refer to?
i know that -? refers zero or one dash but what does above statements refer
Woodcock - Thanks for your comment. The regular expression should be unanchored. I've updated the documentation.
You should mention whether the PCRE application is anchored or not.
how would one search for an expression containing a quote? i.e. How should we escape the quote character?
So you can't pass other regex options like case-insensitive like "/regex/i" (i for case-Insensitive)?
Is the regex in Example 1 wrong? I think the "(?=!\d)" should be "(?<!\d)" and the first full stop (.) should be escaped. Is that correct? (By the way, you might want to look at having "
Thanks for pointing that out, Daniel333. The "3" was a typo. We've fixed it.
Example 1 and 3. Where is 2?
the syntax indicates that regex can be used without specifiying field name, which I dont think is correct<br /><br />Syntax<br /><br />regex = | != | <br /><br />I interperate the above as:<br />args to regex are:<br />field equals regex-expression OR field NOT equal to regex-expression OR regex-expression<br /><br /><br />is the syntax correct? if so why can I not do?:<br />regex <br /><br />I get the following error message:<br />Error in 'SearchOperator:regex': Usage: regex (=|!=)
For case insensitivity use (?i) before an expression.
Case insensitive for the word cat....<br /><br />[Cc][Aa][Tt]<br /><br />will match cAt, CAT, cat, CaT ..... and onward with all permutations.
how to make it case insensitive?
Nbeuchat and Boopaljothi
Thank you for your question about the Example 1 expression that began with (?=!\d)
This is supposed to be a negative lookbehind assertion.
The correct expression should be (?<!\d).
I have changed the example and added a brief explanation.