Storage efficiency
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Storage efficiency
Tuning Splunk's use of storage involves using similar tuning principles as tuning for indexing performance. The less amount of data that Splunk has to put to disk, the better the storage efficiency.
Reduce index density
You can reduce your index size by tuning segmentation. In segmenters.conf, change some MINOR breakers to MAJOR breakers to decrease index size.
Note: This changes search performance. Read about segmentation before making any changes to segmenters.conf.
Configure your data inputs to not index data locally (by editing inputs.conf). Set Splunk to gather data through network mounts rather than through tailing or watching. Or, set tailing and watching to collect local data and copy it to the index. Data coming from network mounts is copied to the index, but is not stored locally and so takes up less space.
As a last resort, you can also configure Splunk to not index raw data at all (extreme version of lowering the density of indexing).
Note: Not indexing raw data will significantly increase your storage efficiency, but will require your users to perform more complex operations at search time. Users will have to search for data by searching timestamps, and core fields. They will then have to filter the results by using where and regex commands. Furthermore, these searches can only regex 10k results at a time.
Tuning inputs.conf
inputs.conf configures all inputs to Splunk including file and directory tailing and watching, network ports and scripted inputs.
You can add and edit sources to input into Splunk. Configuring Splunk to gather data through the network versus through tailing local files is the most efficient way to use storage.
Tuning props.conf
props.conf controls what parameters apply to events during indexing based on settings tied to each event's source, host, or sourcetype.
TRUNCATE = <non-negative integer> (10000) | Change the default maximum line length. Set to 0 if you don't want truncation ever (very long lines are often a sign of garbage data). |
MAX_EVENTS = <integer> (256) | Specifies the maximum number of input lines that will be added to any event. Splunk will break after the specified number of lines are read. |
Tuning segmenters.conf
segmenters.conf defines schemes for how events will be tokenized in Splunk's index.
MAJOR = <space separated list of strings> | Move MINOR breakers into the MAJOR breaker list, or remove breakers in the MAJOR breaker list to change the size and amount of raw data events. | |
MINOR = <space separated list of strings> | Remove the MINOR= string of characters that represent tokens to index by in addition to the MAJOR breaker list. Reduce or remove this list to increase indexing performance. | |
MINOR_LEN = <integer> (-1) | If set and non-negative, specifies how long a minor token can be. Longer minor tokens are discarded without prejudice. | |
MAJOR_LEN = <integer> (-1) | If set and non-negative, specifies how long a major token can be. Longer minor tokens are discarded without prejudice. | |
FILTER=regular expression | Set a regular expression to only segment data that matches the regular expression. | |
LOOKAHEAD=<integer>(-1) | Set how far (in characters) that Splunk looks into an event for segmentation. If FILTER is set, this applies to filtering too. Set to 0 to turn off segmentation entirely. |
Read more about how segmentation works, including how to configure custom segmentation.
This documentation applies to the following versions of Splunk: 3.0 , 3.0.1 , 3.0.2 , 3.1 , 3.1.1 , 3.1.2 , 3.1.3 , 3.1.4 View the Article History for its revisions.