Event segmentation and searching
When data is added to your Splunk instance, the indexer looks for segments in the data. Data is segmented by separating terms into smaller pieces, first with major breakers and then with minor breakers. These breakers are characters like spaces, periods, and colons. There are lists of the major and minor breakers later in this topic.
Suppose an event begins with an IP address and a date, such as 91.205.189.15 - - [13/Aug/2022:18:22:16]
. This data is broken into these segments based on the major breakers:
91.205.189.15 - [13/Aug/2022:18:22:16]
These major segments are further broken down based on the minor breakers. For example, the IP address is broken into minor segments such as 91
, 205
, 189
, and 15
.
Event segmentation at index-time and at search-time
Event segmentation occurs at index-time and at search-time.
- Index-time segmentation
- Index-time segmentation affects indexing and search speed, storage size, and the ability to use typeahead functionality in the Search bar.
- Search-time segmentation
- Search-time segmentation affects search speed and the ability to create searches by selecting items from the search results.
For more information about the distinction between index-time segmentation and search-time segmentation see Index time versus search time in the Splunk Enterprise Managing Indexers and Clusters of Indexers manual.
In this topic we are going to focus on search-time segmentation and how major and minor breakers impact searching.
Searching and punctuation symbols
If your field values contain punctuation symbols such as quotation marks, periods, and colons, follow these best practices in your searches:
- To match a punctuation symbol, don't use a wildcard to match the symbol. Specify the actual symbol.
- Avoid using wildcards in the middle of a term. Wild cards used in the middle of a term might slow down search performance and might return inconsistent results if the term contains punctuation.
- Avoid using wildcards at the beginning of a value. Wild cards used as the first character in a value will slow down search performance.
See Wildcards for more information about using wildcards in your search criteria.
Punctuation symbols and segment tokens
Many punctuation symbols are interpreted as major or minor breakers in event data. These breakers are used to parse the data into small segments.
To search for values that contain punctuation, enclose the data in quotation marks.
For example, an IP address such as 91.205.189.15 is broken into segment tokens based on the period character.
91
205
189
15
91.205
91.205.189
91.205.189.15
205.189
and so forth
To search for this IP address, you must use quotation marks. The quotation marks tell the search to find the complete string "91.205.189.15"
.
Other examples are a bit more complicated.
Suppose you have data that sometimes appears with quotations and sometimes does not, such as app="uat_staging-mgr"
and app=uat_staging-mgr
.
The quotation mark ( " ) is a major breaker. The equal sign ( = ) is a minor breaker.
The data gets parsed into segments as shown in this table:
Data | Segments from major breakers | Segments from minor breakers |
---|---|---|
app="uat_staging-mgr" | app=
uat_staging-mgr |
app
uat |
app=uat_staging-mgr | app=uat_staging-mgr | app
uat |
The quotations around the data make a difference for the major tokens. For app="uat_staging-mgr"
, the quote is a major breaker and so you end up with these 2 segments:
app=
uat_staging-mgr
Where as with app=uat_staging-mgmr
, which does not have any part enclosed in quotations, there is no major breaker and the entire term is 1 segment.
Major breakers
Major breakers are a set of characters that are used to divide words, phrases, or terms in the event data into large tokens. Examples of major breakers are:
- A space
- A newline
- A tab
- Angle brackets < >
- Square brackets [ ]
- Parenthesis ( )
- Curly brackets { }
- An exclamation point !
- A question mark ?
- A semicolon ;
- A comma ,
- Single and double quotation marks ' "
- The ampersand sign &
- There are also multiple major breakers that use percent-encoding, primarily for reserved characters. These major breakers begin with a percent symbol followed by a code. For example, %21 is the code for the exclamation point ( ! ) character and %2526 is a double encoded ampersand ( && ).
For a complete list of segmenters, see segmenters.conf file in the Splunk Enterprise Admin Manual.
Here is an example of part of an event:
91.205.189.15 - - [13/Aug/2022:18:22:16] "GET /oldlink?itemId=EST-14&JSESSIONID=SD6SL7FF7ADFF53113 HTTP 1.1"
This partial example gets segmented on the major breakers into the following tokens:
91.205.189.15 - - 13/Aug/2022:18:22:16 GET /oldlink itemId=EST-14 JSESSIONID=SD6SL7FF7ADFF53113 HTTP 1.1
Minor breakers
Minor breakers are a set of characters that are used to further divide large tokens into smaller tokens.
Examples of minor breakers are:
- A period .
- A forward slash /
- A double backslash \\
- A colon :
- The equal sign =
- The AT sign @
- The pound sign #
- The dollar sign $
- The percent sign %
- The dash sign -
- The underscore sign _
For a complete list of segmenters, see segmenters.conf file in the Splunk Enterprise Admin Manual.
See also
- About event segmentation in the Getting Data In manual
- segmenters.conf file in the Splunk Enterprise Admin Manual
Backslashes | Use CASE() and TERM() to match phrases |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)
Feedback submitted, thanks!