Admin Manual

 


About the Splunk Admin Manual
How Splunk Works

Configure segmentation

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Configure segmentation

Segmentation rules can be tweaked to provide better index compression or improve the usability for a particular data source. If you want to change Splunk's default segmentation behavior, edit segmenters.conf. Once you have set up rules in segmenters.conf, tie them to a specific source, host or souce types via props.conf. Segmentation modes other than inner and full are not recommended.

Edit all configuration files in $SPLUNK_HOME/etc/system/local, or your own custom application directory in $SPLUNK_HOME/etc/apps/.

Note: You can enable any number of segmentation rules applied to different hosts, sources and/or source types in this manner.

There are many different ways you can configure segementers.conf, and you should figure out what works best for your data. Specify which segmentation rules to use for specific hosts, sources or sourcetypes by using props.conf and segmentation. Here are a few general examples of configuration changes you can make:

Full segmentation

Splunk is set to use full segmentation by default. Full segmentation is the combination of both inner and outer segmentation.

Inner segmentation

Inner segmentation is the most efficient segmentation setting, for both search and indexing, while still retaining the most search functionality. It does, however, make typeahead less comprehensive. Switching to inner segmentation at indexing time does not change search behavior at all.

To configure inner segmentation at index time, set SEGMENTATION = inner for your source, sourcetype or host in props.conf. Under these settings, Splunk indexes smaller chunks of data. For example, user.id=foo is indexed as user id foo.

Outer segmentation

Outer segmentation is the opposite of inner segmentation. Instead of indexing only the small tokens individually, outer segmentation indexes entire terms, yielding fewer, larger tokens. For example, "10.1.2.5" is indexed as "10.1.2.5," meaning you cannot search on individual pieces of the phrase. You can still use wildcards, however, to search for pieces of a phrase. For example, you can search for "10.1*" and you will get any events that have IP addresses that start with "10.1". Also, outer segmentation disables the ability to click on different segments of search results, such as the 48.15 segment of the IP address 48.15.16.23. Outer segmentation tends to be marginally more efficient than full segmentation, while inner segmentation tends to be much more efficient.

To enable outer segmentation at index time, set SEGMENTATION = outer for your source, sourcetype or host in props.conf. Also for search to behave properly, add the following lines to $SPLUNK_HOME/etc/system/local/segmenters.conf, so that the search system knows to search for larger tokens:

[search]
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520
MINOR =

This is what's known as tuning "search segmentation". Note that the '.' has been removed from the list of breakers here, so that a search for an IP address for example, will now perform much quicker. The downside of this is that a search partial IP address must now include the '*' wildcard, because your search will no longer look at the individual octets in the index, but will be searching for a complete string. If you implement this scenario, make sure your users are aware of the '*' requirement.

Note: Changes to search segmentation affect all searches across all indexes--it is not a per-index setting. Before you make search segmentation changes, ensure that tuning for one use-case does not negatively impact other indexes.

No segmentation

The most expedient segmentation setting is to disable segmentation completely. There are significant implications for search, however. For example, setting Splunk to index with no segmentation restricts your searches to time, source, host and source type. Only use this setting if you do not need any advanced search capabilities.

To enable this configuration, set SEGMENTATION = none for your source, source type or host in props.conf. Searches for keywords in this source, source type or host will return no results. You can still search for indexed fields.

No segmentation is the most space efficient configuration, but makes searching very difficult. You must pipe your searches through the search command in order to further restrict results. This type of configuration is useful if you value storage efficiency over search performance.


Splunk Web segmentation

Splunk Web also has settings for segmentation. These have nothing to do with indexing segmentation. Splunk Web segmentation affects browser interaction and may speed up search results. To configure Splunk Web segmentation, refer to the User Manual topic, Change Splunk Web preferences.

Click on the Preferences tab in the upper right-hand corner of Splunk Web.

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.