Admin Manual

 


Welcome to Splunk administration

How indexing works

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

How indexing works

Splunk can index any type of time-series data (data with timestamps). When Splunk indexes data, it breaks it into events, based on its timestamps.

Event processing

Event processing occurs in two stages, parsing and indexing. All data that comes into Splunk enters through the parsing pipeline as large (10,000 bytes) chunks. During parsing, Splunk breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs.

While parsing, Splunk performs a number of actions, including:

  • Extracting a set of default fields for each event, including host, source, and sourcetype.
  • Configuring character set encoding.
  • Identifying line termination using linebreaking rules. While many events are short and only take up a line or two, others can be long.
  • Identifying timestamps or creating them if they don't exist. At the same time that it processes timestamps, Splunk identifies event boundaries.
  • Splunk can be set up to mask sensitive event data (such as credit card or social security numbers) at this stage. It can also be configured to apply custom metadata to incoming events.

In the indexing pipeline, Splunk performs additional processing, including:

  • Breaking all events into segments that can then be searched upon. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression.
  • Building the index data structures.
  • Writing the raw data and index files to disk, where post-indexing compression occurs.

The breakdown between parsing and indexing pipelines is mainly of relevance for forwarders. Heavy forwarders can parse data and then forward the parsed data on to indexers for final indexing.

For more information about events and what happens to them during the indexing process, see Overview of event processing in the Getting Data In manual.

Note: Indexing is an I/O-intensive process.

This diagram shows the main processes inherent in indexing:

Datapipeline.png

Note: This diagram represents a simplified view of the indexing architecture. It provides a functional view of the architecture and does not fully describe Splunk internals. In particular, the parsing pipeline actually consists of three pipelines: parsing, merging, and typing, which together handle the parsing function. The distinction can matter during troubleshooting, but does not generally affect how you configure or deploy Splunk.

What's in an index?

Splunk stores all of the data it processes in indexes. An index is a collection of databases, which are directories located in $SPLUNK_HOME/var/lib/splunk. A database directory is named db_<endtime>_<starttime>_<seq_num>. Indexes consist of two types of files: rawdata files and index files. For detailed information, see How Splunk stores indexes.

Splunk comes with the following preconfigured indexes:

  • main: This is the default Splunk index. All processed data is stored here unless otherwise specified.
  • _internal: Stores Splunk internal logs and processing metrics.
  • sampledata: A small amount of sample data is stored here for training purposes.
  • _audit: Contains events related to the file system change monitor, auditing, and all user search history.

A Splunk administrator can create new indexes, edit index properties, remove unwanted indexes, and relocate existing indexes. Splunk administrators manage indexes through Splunk Manager, the CLI, and configuration files such as indexes.conf. For more information, See Managing indexes in this manual.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has around indexing.

This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 4.3.7 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!