Splunk® Cloud Services

SPL2 Search Manual

Event segmentation and searching

When data is added to your Splunk instance, the indexer looks for segments in the data. Data is segmented by separating terms into smaller pieces, first with major breakers and then with minor breakers. These breakers are characters like spaces, periods, and colons. There are lists of the major and minor breakers later in this topic.

Suppose an event begins with an IP address and a date, such as 91.205.189.15 - - [13/Aug/2022:18:22:16] . This data is broken into these segments based on the major breakers:

91.205.189.15
-
[13/Aug/2022:18:22:16]  

These major segments are further broken down based on the minor breakers. For example, the IP address is broken into minor segments such as 91, 205, 189, and 15.

Event segmentation at index-time and at search-time

Event segmentation occurs at index-time and at search-time.

Index-time segmentation
Index-time segmentation affects indexing and search speed, storage size, and the ability to use typeahead functionality in the Search bar.
Search-time segmentation
Search-time segmentation affects search speed and the ability to create searches by selecting items from the search results.

In this topic we are going to focus on search-time segmentation and how major and minor breakers impact searching.

Searching and punctuation symbols

If your field values contain punctuation symbols such as quotation marks, periods, and colons, follow these best practices in your searches:

  • To match a punctuation symbol, don't use a wildcard to match the symbol. Specify the actual symbol.
  • Avoid using wildcards in the middle of a term. Wild cards used in the middle of a term might slow down search performance and might return inconsistent results if the term contains punctuation.
  • Avoid using wildcards at the beginning of a value. Wild cards used as the first character in a value will slow down search performance.

See Wildcards for more information about using wildcards in your search criteria.

Punctuation symbols and segment tokens

Many punctuation symbols are interpreted as major or minor breakers in event data. These breakers are used to parse the data into small segments.

To search for values that contains punctuation, enclose the data in quotation marks.

For example, an IP address such as 91.205.189.15 is broken into segment tokens based on the period character.

91
205
189
15
91.205
91.205.189
91.205.189.15
205.189
and so forth

To search for this IP address, you must use quotation marks. The quotation marks tell the search to find the complete string "91.205.189.15".

Other examples are a bit more complicated.

Suppose you have data that sometimes appears with quotations and sometimes does not, such as app="uat_staging-mgr" and app=uat_staging-mgr.

The quotation mark ( " ) is a major breaker. The equal sign ( = ) is a minor breaker.

The data gets parsed into segments as shown in this table:

Data Segments from major breakers Segments from minor breakers
app="uat_staging-mgr" app=

uat_staging-mgr

app

uat
staging
mgr

app=uat_staging-mgr app=uat_staging-mgr app

uat
staging
mgr

The quotations around the data make a difference for the major segments. For app="uat_staging-mgr", the quote is a major breaker and so you end up with these 2 segments:

app=

uat_staging-mgr

Where as with app=uat_staging-mgmr, which does not have any part enclosed in quotations, there is no major breaker and the entire term is 1 segment.

Major breakers

Major breakers are a set of characters that are used to divide words, phrases, or terms in the event data into large tokens. Examples of major breakers are:

  • A space
  • A newline
  • A tab
  • Square brackets [ ]
  • Parenthesis ( )
  • Curly brackets { }
  • An exclamation point  !
  • A question mark  ?
  • A semicolon ;
  • A comma ,
  • Single and double quotation marks ' "
  • The ampersand sign &
  • There are also multiple major breakers that use percent-encoding, primarily for reserved characters. These major breakers begin with a percent symbol followed by a code. For example, %21 is the code for the exclamation point ( ! ) character and %2526 is a double encoded ampersand ( && ).

Here is an example of part of an event:


91.205.189.15 - - [13/Aug/2022:18:22:16] "GET /oldlink?itemId=EST-14&JSESSIONID=SD6SL7FF7ADFF53113 HTTP 1.1" 

This partial example gets segmented on the major breakers into the following tokens:

91.205.189.15
-
-
[13/Aug/2022:18:22:16] 
GET
/oldlink?itemId=EST-14
JSESSIONID=SD6SL7FF7ADFF53113
HTTP
1.1

Minor breakers

Minor breakers are a set of characters that are used to further divide large tokens into smaller tokens.

Examples of minor breakers are:

  • A period .
  • A forward slash /
  • A double backslash \\
  • A colon :
  • The equal sign =
  • The AT symbol @
  • The hash or pound symbol #
  • The dollar sign $
  • The percent symbol %
  • The dash symbol -
  • The underscore symbol _
Last modified on 02 October, 2023
When to escape characters   Using comments in SPL2

This documentation applies to the following versions of Splunk® Cloud Services: current


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters