Improve data compression with segmentation
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Improve data compression with segmentation
Segmentation is what Splunk uses to break events up into searchable segments at index time, and again at search time. Segments can be classified as major or minor. To put it simply, minor segments are breaks within major segments. For example, the IP address 172.26.34.223 is, as a whole, a major segment. But this major segment can be broken down into minor segments such as 172 as well as groups of minor segments like 172.26.34.
Splunk enables a Splunk admin to define how detailed the event segmentation should be. This is important because index-time segmentation affects indexing and search speed, impacts disk compression, and affects your ability to use typeahead functionality. Search-time segmentation, on the other hand, can also affect search speed as well as your ability to create searches by selecting items from the results displayed in Splunk Web.
Index-time segmentation is set up through segmenters.conf, while search-time segmentation is set in the Options pop-up, which you reach through the Search app interface in Splunk Web.
For more information about "index time" and "search time," see "Index time versus search time" in the Knowledge Manager manual.
Levels of event segmentation
There are three levels of segmentation that the Splunk admin can choose from for index time and search time:
- Inner segmentation breaks events down into the smallest minor segments possible. For example, when an IP address such as
172.26.34.223goes through inner segmentation, it is broken down into172,26,34, and223. Setting inner segmentation at index time leads to very efficient indexes in terms of search speed, but it also impacts indexing speed and restricts the typeahead functionality (it will only be able to typeahead at the minor segment level). - Outer segmentation is the opposite of inner segmentation. Under outer segmentation only major segments are indexed. In the previous example, the IP address would not be broken down into any components. If you have outer segmentation set at index time you will be unable to search on individual pieces of the IP address without using wildcard characters. Indexes created using outer segmentation tend to be marginally more efficient than those created with full segmentation, but are not quite as efficient as those created through inner segmentation.
- Full segmentation is in some respects a combination of inner and outer segmentation. Under full segmentation, the IP address is indexed both as a major segment and as a variety of minor segments, including minor segment combinations like
172.26and172.26.34. This is the least efficient indexing option, but it provides the most versatility in terms of searching.
Note: By default, index-time segmentation is set to a combination of inner and outer segmentation, and search-time segmentation is set to full segmentation.
For more information about changing the segmentation level, see Configure segmentation to manage disk usage in this manual.
Defining segmentation rules for specific hosts, sources, or source types
A Splunk admin can define index time and search time segmentation rules that apply specifically to events with particular hosts, sources, or sourcetypes. If you run searches that involve a particular sourcetype on a regular basis, you could use this to improve the performance of those searches. Similarly, if you typically index a large number of syslog events, you could use this feature to help decrease the overall disk space that those events take up.
For details about how to set these special segmentation rules up, see Configure custom segmentation for a host, source, or source type in this manual.
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.