Configure segmentation to manage disk usage
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
Configure segmentation to manage disk usage
Segmentation is how Splunk breaks events up during indexing into usable chunks, called tokens. A token is a piece of information within an event, such as an error code or a user ID. The level of segmentation you choose can increase or decrease the size of the chunks.
Segmentation can affect indexing and searching speed, as well as disk space usage. You can change the level of segmentation to improve indexing or searching speed, although this is not typically necessary.
You can adjust segmentation rules to provide better index compression or improve the usability for a particular data source. If you want to change Splunk's default segmentation behavior, edit segmenters.conf. Once you have set up rules in segmenters.conf, tie them to a specific source, host or sourcetype via props.conf. Segmentation modes other than inner and full are not recommended.
Edit all configuration files in $SPLUNK_HOME/etc/system/local, or your own custom application directory in $SPLUNK_HOME/etc/apps/.
Note: You can enable any number of segmentation rules applied to different hosts, sources, and/or sourcetypes in this manner.
There are many different ways you can configure segementers.conf, and you should figure out what works best for your data. Specify which segmentation rules to use for specific hosts, sources, or sourcetypes by using props.conf and segmentation. Here are the main types of index-time segmentation:
Full segmentation
Splunk is set to use full segmentation by default. Full segmentation is the combination of inner and outer segmentation.
Inner segmentation
Inner segmentation is the most efficient segmentation setting for both search and indexing, while still retaining the most search functionality. It does, however, make typeahead less comprehensive. Switching to inner segmentation does not change search behavior at all.
To enable inner segmentation, set SEGMENTATION = inner for your source, sourcetype, or host in props.conf. Under these settings, Splunk indexes smaller chunks of data. For example, user.id=foo is indexed as user id foo.
Outer segmentation
Outer segmentation is the opposite of inner segmentation. Instead of indexing only the small tokens individually, outer segmentation indexes entire terms, yielding fewer, larger tokens. For example, "10.1.2.5" is indexed as "10.1.2.5," meaning you cannot search on individual pieces of the phrase. You can still use wildcards, however, to search for pieces of a phrase. For example, you can search for "10.1*" and you will get any events that have IP addresses that start with "10.1". Also, outer segmentation disables the ability to click on different segments of search results, such as the 48.15 segment of the IP address 48.15.16.23. Outer segmentation tends to be marginally more efficient than full segmentation, while inner segmentation tends to be much more efficient.
To enable outer segmentation, set SEGMENTATION = outer for your source, sourcetype, or host in props.conf. Also for search to behave properly, add the following stanza to $SPLUNK_HOME/etc/system/local/segmenters.conf, so that the search system knows to search for larger tokens:
[search]
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520
MINOR =
No segmentation
The most space-efficient segmentation setting is to disable segmentation completely. This has significant implications for search, however. By setting Splunk to index with no segmentation, you restrict searches to time, source, host, and sourcetype. You must pipe your searches through the search command to further restrict results. Use this setting only if you do not need any advanced search capabilities.
To disable segmentation, set SEGMENTATION = none for your source, sourcetype, or host in props.conf. Searches for keywords in this source, sourcetype, or host will return no results. You can still search for indexed fields.
Splunk Web segmentation for search results
Splunk Web has settings for segmentation in search results. These have nothing to do with index-time segmentation. Splunk Web segmentation affects browser interaction and can speed up search results. To set search-result segmentation:
1. Perform a search. Look at the results.
2. Click Options... above the returned set of events.
3. In the Event Segmentation dropdown box, choose from the available segmentation types: full, inner, outer, or raw. The default is "full".
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.