Splunk® Enterprise

Managing Indexers and Clusters of Indexers

Download manual as PDF

Download topic as PDF

How the indexer stores indexes

As the indexer indexes your data, it creates a number of files. These files contain two types of data:

Together, these files constitute the Splunk Enterprise index. The files reside in sets of directories organized by age. Some directories contain newly indexed data; others contain previously indexed data. The number of such directories can grow quite large, depending on how much data you're indexing.

Why you might care

You might not care, actually. The indexer handles indexed data by default in a way that gracefully ages the data through several stages. After a long period of time, typically several years, the indexer removes old data from your system. You might well be fine with the default scheme it uses.

However, if you're indexing large amounts of data, have specific data retention requirements, or otherwise need to carefully plan your aging policy, you've got to read this topic. Also, to back up your data, it helps to know where to find it. So, read on....

How data ages

Each of the index directories is known as a bucket. To summarize so far:

  • An "index" contains compressed raw data and associated index files.
  • An index resides across many age-designated index directories.
  • An index directory is called a bucket.

A bucket moves through several stages as it ages:

  • hot
  • warm
  • cold
  • frozen
  • thawed

As buckets age, they "roll" from one stage to the next. As data is indexed, it goes into a hot bucket. Hot buckets are both searchable and actively being written to. An index can have several hot buckets open at a time.

When certain conditions occur (for example, the hot bucket reaches a certain size or splunkd gets restarted), the hot bucket becomes a warm bucket ("rolls to warm"), and a new hot bucket is created in its place. Warm buckets are searchable, but are not actively written to. There are many warm buckets.

Once further conditions are met (for example, the index reaches some maximum number of warm buckets), the indexer begins to roll the warm buckets to cold, based on their age. It always selects the oldest warm bucket to roll to cold. Buckets continue to roll to cold as they age in this manner. After a set period of time, cold buckets roll to frozen, at which point they are either archived or deleted. By editing attributes in indexes.conf, you can specify the bucket aging policy, which determines when a bucket moves from one stage to the next.

If the frozen data has been archived, it can later be thawed. Thawed data is available for searches.

Here are the stages that buckets age through:

Bucket stage Description Searchable?
Hot Contains newly indexed data. Open for writing. One or more hot buckets for each index. Yes
Warm Data rolled from hot. There are many warm buckets. Data is not actively written to warm buckets. Yes
Cold Data rolled from warm. There are many cold buckets. Yes
Frozen Data rolled from cold. The indexer deletes frozen data by default, but you can choose to archive it instead. Archived data can later be thawed. No
Thawed Data restored from an archive. If you archive frozen data, you can later return it to the index by thawing it. Yes

The collection of buckets in a particular stage is sometimes referred to as a database or "db": the "hot db", the "warm db", the "cold db", etc.

What the index directories look like

Each index occupies its own directory under $SPLUNK_HOME/var/lib/splunk. The name of the directory is the same as the index name. Under the index directory are a series of subdirectories that categorize the buckets by stage (hot/warm, cold, or thawed).

The buckets themselves are subdirectories within those directories. The bucket directory names are based on the age of the data.

Here is the directory structure for the default index (defaultdb):

Bucket stage Default location Notes
Hot $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* There can be multiple hot subdirectories, one for each hot bucket. See "Bucket naming conventions".
Warm $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* There are separate subdirectories for each warm bucket. See "Bucket naming conventions".
Cold $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/* There are multiple cold subdirectories. When warm buckets roll to cold, they get moved into this directory, but are not renamed.
Frozen N/A: Frozen data gets deleted or archived into a directory location you specify. Deletion is the default; see "Archive indexed data" for information on how to archive the data instead.
Thawed $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/* Location for data that has been archived and later thawed. See "Restore archived data" for information on restoring archived data to a thawed state.

The paths for the hot/warm, cold, and thawed directories are configurable, so, for example, you can store cold buckets in a separate location from hot/warm buckets. See "Configure index storage" and "Use multiple partitions for index data".

Important: All index locations must be writable.

Note: In pre-6.0 versions of Splunk Enterprise, replicated copies of cluster buckets always resided in the colddb directory, even if they were hot or warm buckets. Starting with 6.0, hot and warm replicated copies reside in the db directory, the same as for non-replicated copies.

Bucket naming conventions

Bucket names depend on:

  • The stage of the bucket: hot or warm/cold/thawed
  • The type of bucket directory: non-clustered, clustered originating, or clustered replicated

Important: Bucket naming conventions are subject to change.

Non-clustered buckets

A standalone indexer creates non-clustered buckets. These use one type of naming convention.

Clustered buckets

An indexer that is part of an indexer cluster creates clustered buckets. A clustered bucket has multiple exact copies. The naming convention for clustered buckets distinguishes the types of copies, originating or replicated.

Briefly, a bucket in an indexer cluster has multiple copies, according to its replication factor. When the data enters the cluster, the receiving indexer writes the data to a hot bucket. This receiving indexer is known as the source cluster peer, and the bucket where the data gets written is called the originating copy of the bucket.

As data is written to the hot copy, the source peer streams copies of the hot data, in blocks, to other indexers in the cluster. These indexers are referred to as the target peers for the bucket. The copies of the streamed data on the target peers are known as replicated copies of the bucket.

When the source peer rolls its originating hot bucket to warm, the target peers roll their replicated copies of that bucket. The warm copies are exact replicas of each other.

For an introduction to indexer cluster architecture and replicated data streaming, read "Basic indexer cluster architecture".

Bucket names

These are the naming conventions:

Bucket type Hot bucket Warm/cold/thawed bucket
Non-clustered hot_v1_<localid> db_<newest_time>_<oldest_time>_<localid>
Clustered originating hot_v1_<localid> db_<newest_time>_<oldest_time>_<localid>_<guid>
Clustered replicated <localid>_<guid> rb_<newest_time>_<oldest_time>_<localid>_<guid>

Note:

  • <newest_time> and <oldest_time> are timestamps indicating the age of the data in the bucket. The timestamps are expressed in UTC epoch time (in seconds). For example: db_1223658000_1223654401_2835 is a warm, non-clustered bucket containing data from October 10, 2008, covering the exact period of 9am-10am.
  • <localid> is an ID for the bucket. For a clustered bucket, the originating and replicated copies of the bucket have the same <localid>.
  • <guid> is the guid of the source peer node. The guid is located in the peer's $SPLUNK_HOME/etc/instance.cfg file.

In an indexer cluster, the originating warm bucket and its replicated copies have identical names, except for the prefix (db for the originating bucket; rb for the replicated copies).

Note: In an indexer cluster, when data is streamed from the source peer to a target peer, the data first goes into a temporary directory on the target peer, identified by the hot bucket convention of <localid>_<guid>. This is true for any replicated bucket copy, whether or not the streaming bucket is a hot bucket. For example, during bucket fix-up activities, a peer might stream a warm bucket to other peers. When the replication of that bucket has completed, the <localid>_<guid> directory is rolled into a warm bucket directory, identified by the rb_ prefix.

Buckets and Splunk Enterprise administration

When you are administering Splunk Enterprise, it helps to understand how the indexer stores indexes across buckets. In particular, several admin activities require a good understanding of buckets:

  • To learn how to back up your data, read "Back up indexed data." That topic also discusses how to manually roll hot buckets to warm, so that you can then back them up.

In addition, see "indexes.conf" in the Admin Manual.

PREVIOUS
Use the monitoring console to view indexing performance
  NEXT
Configure index storage

This documentation applies to the following versions of Splunk® Enterprise: 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.1.3


Comments

In the section "What the index directories look like", you should also cover what is stored within the "summary" and "datamodel_summary" subdirectories.

Glennsinclair
August 14, 2018

test

Sectest61
September 1, 2016

test

Sectest61
September 1, 2016

Graham Hannington -

Thank you for those comments on this topic. Your point about the directory names is absolutely right. That sentence only pertains to custom indexes. The default indexes have names that vary from the directories. I will update that material.

Regarding your question about those other files. - I'll look into that issue, but it might take a little while to track down an answer for you. In the meantime, for a more immediate response, you could try posing the question on Splunk Answers.

Sgoodman, Splunker
May 18, 2016

This topic is very useful, thank you. However, re:

> What the index directories look like
> Each index occupies its own directory under $SPLUNK_HOME/var/lib/splunk.

In addition to a directory for each index, my $SPLUNK_HOME/var/lib/splunk directory (Splunk Enterprise 6.4 on Windows) contains multiple *.dat files and a single .dirty_database file.

Could you please describe these files in this topic, or link to descriptions elsewhere? (Preferably including information on how "important" they are: for example, what to do about .dat files when copying indexes from one server to another.)

Re:

> The name of the directory is the same as the index name

Not true. I see a directory named "defaultdb", but there is no index named "defaultdb". Rather, "defaultdb" is a component of the paths (e.g. homePath) for an index named "main". Some other examples: "historydb" directory and "history" index, "audit" directory and "_audit" (with a leading underscore) index.

Graham Hannington
May 16, 2016

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters