How indexing works
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Contents
How indexing works
Indexing is the manner in which Splunk processes the data that you send to it so that it can be searched on and analyzed. Splunk can index any type of time-series data (data with timestamps). When Splunk indexes data, it breaks it into events based on its timestamps.
As Splunk processes event data for the index, it performs a variety of actions on those events:
- If a timestamp does not exist for the event, Splunk attempts to create one for it. Splunk can be configured to apply timezone offsets and recognize European date formatting.
- All events are broken down into segments that can then be searched upon. You can determine the level of segmentation, which effects indexing and searching speed, search capability, and efficiency of disk compression.
- While many events are short and only take up a line or two, others can be long. Splunk uses linebreaking rules to determine how it breaks these events up for display in the search results.
- As Splunk processes incoming event data, it extracts sets of default fields for each event, including the event
host,source, andsourcetype. - Splunk can be set up to anonymize sensitive event data (such as credit card or social security numbers) during the indexing process. It can also be configured to apply custom metadata to incoming events.
For more information about events and what happens to them during the indexing process, see "About events" in this manual.
Indexing is an I/O-intensive process.
What's in an index?
Splunk stores all of the data it processes in indexes. Indexes, in turn, are stored in databases, which are located in $SPLUNK_HOME/var/lib/splunk. A database is a directory named db_<starttime>_<endtime>_<seq_num>. An index is a collection of database directories.
Splunk comes with the following preconfigured indexes:
- main: This is the default Splunk index. All processed data is stored here unless otherwise specified.
- splunklogger: Splunk keeps track of its internal logs in this index.
- _internal: Stores Splunk processing metrics.
- sampledata: A small amount of sample data is stored here for training purposes.
- _thefishbucket: Contains internal file processing information.
- _audit: Contains events related to the file system change monitor, auditing, and all user search history.
A Splunk administrator can create new indexes, edit index properties, remove unwanted indexes, and relocate existing indexes. Splunk administrators manage indexes both through Splunk Manager, the CLI, and configuration files such as indexes.conf. For more information, see the "Manage Indexes" section of the Admin manual.
This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10 , 4.0.11 View the Article History for its revisions.