Indexing performance
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Indexing performance
Splunk's indexing performance can be maximized by tweaking settings in Splunk's configuration files. Here are some basic tweaks you can implement to improve indexing performance:
- Change Splunk's time stamp extraction settings in props.conf :
- Set Splunk to look fewer characters into an event for a time stamp, (or turn off time stamp extraction).
- Use strptime formatting for timestamps (%d/%m/%Y %H:%M:%S).
- Edit Splunk's aggregator function to turn off line merging.
- Reduce segmentation of events by altering the MAJOR and MINOR breakers.
- Turn off some of Splunk's advanced features.
Negative impact on indexing performance
- The more regexes you configure in transforms.conf, the longer indexing takes. Make sure all of your regexes are necessary.
- Custom processing.
- Using many fields extracted during indexing (see indexed fields).
- Using your own C/C++ modules.
Processors
Splunk has several internal processors. If you notice that Splunk isn't indexing your data as you like, you can track down exactly which processor is responsible for the delay by running the following search:
index::_internal NOT sendout group=pipeline | timechart sum(cpu_seconds) by processorThis search shows you a chart of Splunk's internal processors. If one processor in particular is taking up more cpu time than another, you can tweak settings to reduce this.
Below are some tuning parameters in Splunk's configuration files that affect indexing performance.
indexes.conf
indexes.conf controls how Splunk's indexes are configured. You can change the following entries to improve indexing performance.
| Argument | Description |
|---|---|
indexThreads = <non-negative number> (0)
| The number of extra threads to use for a specific index. Turning up the number of index threads may improve indexing, but is dependent on the capability of your hardware.
Important: This |
maxMemMB = <non-negative number> (50)
| Amount of memory to allocate for indexing. This amount will be allocated in an escalating amount per index thread, each thread beyond the first will allocate N * maxMemMB. For example, if you have indexThreads set to 2 and maxMemMB set to 100, the first thread will use 100MB, the second thread will use 200MB for a total of 300 MB of memory.
Note:Increasing this value by *small* amounts may improve indexing throughput. Increasing this value by large amounts will have significant negative performance consequences across all splunk activities by wasting memory that would be better allocated to other data. |
maxDataSize = <non-negative number> (750)
| Max amount of data in MBs db hot can grow to. On 32 bit systems we recommend the value 750. On 64 bit systems we recommend the value 10000. These are the defaults for the appropriate downloads. |
props.conf
props.conf controls what parameters apply to events during indexing based on settings tied to each event's source, host, or sourcetype.
| Argument | Description |
|---|---|
DATETIME_CONFIG = <filename relative to Splunk_HOME> (/etc/datetime.xml)
| Specifies the file to configure the timestamp extractor. This configuration may also be set to "NONE" to prevent the timestamp extractor from running or "CURRENT" to assign the current system time to each event. |
TIME_FORMAT = <strptime-style format> (empty)
| Specifies a strptime format to extract the date. Specifying a strptime format for date extraction accelerates event indexing. |
MAX_TIMESTAMP_LOOKAHEAD = <integer> (150)
| Specifies how far into an event Splunk should look for a timestamp. If you know your timestamp is in the first n characters of the event, set this to n. This will increase the speed of indexing. |
segmenters.conf
segmenters.conf defines schemes for how events will be tokenized in Splunk's index.
| Argument | Description |
|---|---|
MAJOR = <space separated list of strings>
| Move MINOR breakers into the MAJOR breaker list, or remove breakers in the MAJOR breaker list to change the size and amount of raw data events. |
MINOR = <space separated list of strings>
| Remove the MINOR= string of characters that represent tokens to index by in addition to the MAJOR breaker list. Reduce or remove this list to increase indexing performance. |
Read more about how to configure custom segmentation.
This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.