Admin Manual

 


How indexing works

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

How indexing works

Indexing is how Splunk processes the data you send it. Splunk can index any time-series data, which is data that has a timestamp associated with it. If the data does not have a timestamp, Splunk will apply the current time to the data as it indexes it. When data is indexed, Splunk breaks it into events based on its timestamps; you can also specify other event delimiters, such as a regex match or whitespace.

All data that comes into Splunk is indexed through the universal pipeline. Data enters the universal pipeline as large (10,000 bytes) chunks. As part of pipeline processing, these chunks are broken into events. Initially, newline characters signal an event boundary. In the next stage of processing, Splunk applies line merging rules specified in props.conf.

As part of indexing, events are broken into sections called segments. Splunk uses a list of breaking characters and other rules (such as the maximum number of characters per segment) that are configurable through segmenters.conf.

Indexing is an I/O-intensive process. If you're building a system to index a lot of data, Splunk recommends you take this into consideration.

HowIndexWorksdiagram.png

The splunk-optimize process

While Splunk is indexing data, one or more instances of the splunk-optimize process will run intermittently, merging index files together to optimize performance when searching the data. The splunk-optimize process can use a significant amount of cpu, but should not consume it indefinitely, only for a short amounts of time. You can alter the number of concurrent instances of splunk-optimize by changing the value set for maxConcurrentOptimizes in indexes.conf, but this is not typically necessary.

splunk-optimize should only run on db-hot.
You can run it on warm DB's manually if you find one with a larger number of .tsidx files (more than 25) - ./splunk-optimize <directory>
If splunk-optimize does not run often enough, search efficiency will be affected.

What's in an index?

Splunk stores all processed data in indexes. Indexes, in turn, are stored in databases, which are located in $SPLUNK_HOME/var/lib/splunk. A database is a directory named db_<starttime>_<endtime>_<seq_num>. An index is a collection of database directories.

Splunk comes with preconfigured indexes:

Read About managing indexes in this manual for more information.

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!