Splunk® Enterprise

Search Manual

Download manual as PDF

Download topic as PDF

Event segmentation and searching

When data is added to your Splunk instance, the indexer looks for segments in the data. Data is segmented by separating terms into smaller pieces, first with major breakers and then with minor breakers. These breakers are characters like spaces, periods, and colons. There are lists of the major and minor breakers later in this topic.

Suppose an event begins with an IP address and a date, such as 91.205.189.16 - - [13/Aug/2019:18:22:16] . This data is broken into these segments based on the major breakers:

91.205.189.15
-
[13/Aug/2019:18:22:16]  

These major segments are further broken down based on the minor breakers. For example, the IP address is broken into minor segments such as 91, 205, 189, and 15, as well as groups of minor segments like 19.205 and 19.205.189.


Event segmentation at index-time and at search-time

Event segmentation occurs at index-time and at search-time.

Index-time segmentation
Index-time segmentation affects indexing and search speed, storage size, and the ability to use typeahead functionality in the Search bar in Splunk Web.
Search-time segmentation
Search-time segmentation affects search speed and the ability to create searches by selecting items from the results displayed in Splunk Web.

For more information about the distinction between index-time segmentation and search-time segmentation see Index time versus search time in the Managing Indexers and Clusters of Indexers manual.

In this topic we are going to focus on search-time segmentation and how major and minor breakers impact searching.

Searching and punctuation symbols

If your field values contain punctuation symbols such as quotation marks, periods, and colons, follow these best practices in your searches:

  • To match a punctuation symbol, don't use a wildcard to match the symbol. Specify the actual symbol.
  • Avoid using wildcards in the middle of a value. Wild cards used in the middle of a value might slow down search performance.
  • Avoid using wildcards at the beginning of a value. Wild cards used as the first character in a value will slow down search performance.

See Wildcards for more information about using wildcards in your search criteria.

Punctuation symbols and segment tokens

Many punctuation symbols are interpreted as major or minor breakers in event data. These breakers are used to segment the data into smaller tokens.

To search for values that contains punctuation, enclose the data in quotation marks.

For example, an IP address such as 91.205.189.15 is broken into segment tokens based on the period character, such as: 91 205 189 15

To search for this IP address, use quotation marks "91.205.189.15".

Other examples are a bit more complicated.

Suppose you have data that sometimes appears with quotations and sometimes does not, such as app="uat_staging-mgr" and app=uat_staging-mgr.

The quotation mark ( " ) is a major breaker. The equal sign ( = ) is a minor breaker.

The data gets segmented as shown in this table:

Data Segment tokens from major breakers Segments tokens from minor breakers
app="uat_staging-mgr" app=

uat_staging-mgr

app

uat
staging
mgr

app=uat_staging-mgr app=uat_staging-mgr app

uat
staging
mgr

The quotations around the data make a difference for the major tokens. For app="uat_staging-mgr", the quote is a major breaker and so you end up with these 2 tokens:

app=

uat_staging-mgr

Where as with app=uat_staging-mgmr, there is no major breaker and the entire thing is 1 token.



Major breakers

Major breakers are a set of characters that are used to divide words, phrases, or terms in the event data into large tokens. Examples of major breakers are:

  • A space
  • A newline
  • A tab
  • Square brackets [ ]
  • Parenthesis ( )
  • Curly brackets { }
  • An exclamation point  !
  • A semicolon ;
  • A comma ,
  • Single and double quotation marks ' "
  • The ampersand sign &


Here is an example of part of an event:


91.205.189.15 - - [13/Aug/2019:18:22:16] "GET /oldlink?itemId=EST-14&JSESSIONID=SD6SL7FF7ADFF53113 HTTP 1.1" 

This partial example gets segmented on the major breakers into the following tokens:

91.205.189.15
-
-
[13/Aug/2019:18:22:16] 
GET
/oldlink?itemId=EST-14
JSESSIONID=SD6SL7FF7ADFF53113
HTTP
1.1

Minor breakers

Minor breakers are a set of characters that are used to further divide large tokens into smaller tokens.

Examples of minor breakers are:

  • A period .
  • A forward slash /
  • A double backslash \\
  • A colon :
  • The equal sign =
  • The AT symbol @
  • The hash or pound symbol #
  • The ampersand symbol &
  • The dollar sign $
  • The percent symbol %
  • The dash symbol -
  • The underscore symbol _

For a complete list of segmenters, see segmenters.conf file in the Admin Manual.


See also

PREVIOUS
Difference between NOT and !=
  NEXT
Use CASE() and TERM() to match phrases

This documentation applies to the following versions of Splunk® Enterprise: 8.0.0


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters