Splunk Cloud Platform

Search Manual

Event segmentation and searching

When data is added to your Splunk instance, the indexer looks for segments in the data. Data is segmented by separating terms into smaller pieces, first with major breakers and then with minor breakers. These breakers are characters like spaces, periods, and colons. There are lists of the major and minor breakers later in this topic.

Suppose an event begins with an IP address and a date, such as 91.205.189.15 - - [13/Aug/2022:18:22:16] . This data is broken into these segments based on the major breakers:

91.205.189.15
-
[13/Aug/2022:18:22:16]  

These major segments are further broken down based on the minor breakers. For example, the IP address is broken into minor segments such as 91, 205, 189, and 15.

Event segmentation at index-time and at search-time

Event segmentation occurs at index-time and at search-time.

Index-time segmentation
Index-time segmentation affects indexing and search speed, storage size, and the ability to use typeahead functionality in the Search bar.
Search-time segmentation
Search-time segmentation affects search speed and the ability to create searches by selecting items from the search results.

For more information about the distinction between index-time segmentation and search-time segmentation see Index time versus search time in the Splunk Enterprise Managing Indexers and Clusters of Indexers manual.

In this topic we are going to focus on search-time segmentation and how major and minor breakers impact searching.

Searching and punctuation symbols

If your field values contain punctuation symbols such as quotation marks, periods, and colons, follow these best practices in your searches:

  • To match a punctuation symbol, don't use a wildcard to match the symbol. Specify the actual symbol.
  • Avoid using wildcards in the middle of a term. Wild cards used in the middle of a term might slow down search performance and might return inconsistent results if the term contains punctuation.
  • Avoid using wildcards at the beginning of a value. Wild cards used as the first character in a value will slow down search performance.

See Wildcards for more information about using wildcards in your search criteria.

Punctuation symbols and segment tokens

Many punctuation symbols are interpreted as major or minor breakers in event data. These breakers are used to parse the data into small segments.

To search for values that contain punctuation, enclose the data in quotation marks.

For example, an IP address such as 91.205.189.15 is broken into segment tokens based on the period character.

91
205
189
15
91.205
91.205.189
91.205.189.15
205.189
and so forth

To search for this IP address, you must use quotation marks. The quotation marks tell the search to find the complete string "91.205.189.15".

Other examples are a bit more complicated.

Suppose you have data that sometimes appears with quotations and sometimes does not, such as app="uat_staging-mgr" and app=uat_staging-mgr.

The quotation mark ( " ) is a major breaker. The equal sign ( = ) is a minor breaker.

The data gets parsed into segments as shown in this table:

Data Segments from major breakers Segments from minor breakers
app="uat_staging-mgr" app=

uat_staging-mgr

app

uat
staging
mgr

app=uat_staging-mgr app=uat_staging-mgr app

uat
staging
mgr

The quotations around the data make a difference for the major tokens. For app="uat_staging-mgr", the quote is a major breaker and so you end up with these 2 segments:

app=

uat_staging-mgr

Where as with app=uat_staging-mgmr, which does not have any part enclosed in quotations, there is no major breaker and the entire term is 1 segment.


Major breakers

Major breakers are a set of characters that are used to divide words, phrases, or terms in the event data into large tokens. Examples of major breakers are:

  • A space
  • A newline
  • A tab
  • Angle brackets < >
  • Square brackets [ ]
  • Parenthesis ( )
  • Curly brackets { }
  • An exclamation point  !
  • A question mark  ?
  • A semicolon ;
  • A comma ,
  • Single and double quotation marks ' "
  • The ampersand sign &
  • There are also multiple major breakers that use percent-encoding, primarily for reserved characters. These major breakers begin with a percent symbol followed by a code. For example, %21 is the code for the exclamation point ( ! ) character and %2526 is a double encoded ampersand ( && ).

For a complete list of segmenters, see segmenters.conf file in the Splunk Enterprise Admin Manual.

Here is an example of part of an event:


91.205.189.15 - - [13/Aug/2022:18:22:16] "GET /oldlink?itemId=EST-14&JSESSIONID=SD6SL7FF7ADFF53113 HTTP 1.1" 

This partial example gets segmented on the major breakers into the following tokens:

91.205.189.15
-
-
13/Aug/2022:18:22:16
GET
/oldlink
itemId=EST-14
JSESSIONID=SD6SL7FF7ADFF53113
HTTP
1.1

Minor breakers

Minor breakers are a set of characters that are used to further divide large tokens into smaller tokens.

Examples of minor breakers are:

  • A period .
  • A forward slash /
  • A double backslash \\
  • A colon :
  • The equal sign =
  • The AT sign @
  • The pound sign #
  • The dollar sign $
  • The percent sign %
  • The dash sign -
  • The underscore sign _

For a complete list of segmenters, see segmenters.conf file in the Splunk Enterprise Admin Manual.

See also

Last modified on 22 February, 2023
Backslashes   Use CASE() and TERM() to match phrases

This documentation applies to the following versions of Splunk Cloud Platform: 9.3.2408, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters