Admin Manual

 


How Splunk stores indexes

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

How Splunk stores indexes

As Splunk indexes your data, it creates a bunch of files. These files contain two types of data:

Together, these files constitute the Splunk index. The files reside in sets of directories organized by age. Some directories contain newly indexed data; others contain previously indexed data. The number of such directories can grow quite large, depending on how much data you're indexing.

Why you might care

You might not care, actually. Splunk handles indexed data by default in a way that gracefully ages the data through several stages. After a long period of time, typically several years, Splunk removes old data from your system. You might well be fine with the default scheme it uses.

However, if you're indexing large amounts of data, have specific data retention requirements, or otherwise need to carefully plan your aging policy, you've got to read this topic. Also, to back up your data, it helps to know where to find it. So, read on....

How Splunk ages data

Each of the index directories is known as a bucket. To summarize so far:

A bucket moves through several stages as it ages:

As buckets age, they "roll" from one stage to the next. Newly indexed data goes into a hot bucket, which is a bucket that's both searchable and actively being written to. After the hot bucket reaches a certain size, it becomes a warm bucket, and a new hot bucket is created. Warm buckets are searchable, but are not actively written to. There are many warm buckets.

Once Splunk has created some maximum number of warm buckets, it begins to roll the warm buckets to cold based on their age. Always, the oldest warm bucket rolls to cold. Buckets continue to roll to cold as they age in this manner. After a set period of time, cold buckets roll to frozen, at which point they are either archived or deleted. By editing attributes in indexes.conf, you can specify the bucket aging policy, which determines when a bucket moves from one stage to the next.

Here are the stages that buckets age through:

Bucket stage Description Searchable?
Hot Contains newly indexed data. Open for writing. One or more hot buckets for each index. Yes.
Warm Data rolled from hot. There are many warm buckets. Yes.
Cold Data rolled from warm. There are many cold buckets. Yes, but only when the search specifies a time range included in these files.
Frozen Data rolled from cold. Splunk deletes frozen data by default, but you can also archive it. No.

The collection of buckets in a particular stage is sometimes referred to as a database or "db": the "hot db", the "warm db", the "cold db", etc.

What the index directories look like

Each bucket occupies its own subdirectory within a larger database directory. Splunk organizes the directories to distinguish between hot/warm/cold buckets. In addition, the bucket directory names are based on the age of the data.

Here's the directory structure for the default index:

Bucket type Default location Notes
Hot $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* There can be multiple hot subdirectories. Each hot bucket occupies its own subdirectory, which uses this naming convention:
hot_v1_<ID>
Warm $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* There are multiple warm subdirectories. Each warm bucket occupies its own subdirectory, which uses this naming convention:
db_<newest_time>_<oldest_time>_<ID>

where <newest_time> and <oldest_time> are timestamps indicating the age of the data within.

The timestamps are expressed in UTC epoch time (in seconds). For example: db_1223658000_1223654401_2835 is a warm bucket containing data from October 10, 2008, covering the exact period of 9am-10am.

Cold $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/* There are multiple cold subdirectories. When warm buckets roll to cold, they get moved into this directory, but are not renamed.
Frozen N/A: Data deleted, or archived into a directory structure of your design. Deletion is the default; archiving is accomplished through user-created script.
Thawed $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/* Location for data that has been archived and later thawed. See "Restore archived data" for information on restoring archived data to a "thawed" state.

The paths for hot/warm and cold directories are configurable, so you can store cold buckets in a separate location from hot/warm buckets. See "Use multiple partitions for index data".

Caution: All index locations must be writable.

Configure your indexes

You configure indexes in indexes.conf. You can edit a copy of indexes.conf in $SPLUNK_HOME/etc/system/local/ or in your own custom application directory in $SPLUNK_HOME/etc/apps/. Do not edit the copy in $SPLUNK_HOME/etc/system/default. For information on configuration files and directory locations, see "About configuration files".

This table lists the key indexes.conf attributes affecting buckets and what they configure. It also provides links to other topics that show how to use these attributes. For the most detailed information on these attributes, as well as others, always refer to "indexes.conf".

Attribute What it configures Default For more information, see ...
homePath The path that contains the hot and warm buckets. (Required.) $SPLUNK_HOME/var/lib/splunk/defaultdb/db/ (for the default index only) Use multiple partitions for index data
coldPath The path that contains the cold buckets. (Required.) $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/ (for the default index only) Use multiple partitions for index data
thawedPath The path that contains any thawed buckets. (Required.) $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/ (for the default index only) Use multiple partitions for index data
maxHotBuckets The maximum number of hot buckets. 1, for new, custom indexes. However, if you create a new index, you should set this value to at least 2, to deal with any archival data. The main default index, for example, has this value set to 10. How Splunk ages data
maxDataSize Determines rolling behavior, hot to warm. The maximum size for a hot bucket. When a hot bucket reaches this size, it rolls to warm. This attribute also determines the approximate size for all buckets. Depends; see indexes.conf. Use multiple partitions for index data

Set a retirement and archiving policy

maxWarmDBCount Determines rolling behavior, warm to cold. The maximum number of warm buckets. When the maximum is reached, warm buckets begin rolling to cold. 300 Use multiple partitions for index data
maxTotalDataSizeMB Determines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen. 500000 (MB) Set a retirement and archiving policy
frozenTimePeriodInSecs Determines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen. 188697600 (in seconds; approx. 6 years) Set a retirement and archiving policy
coldtoFrozenScript Script to run just before a cold bucket rolls to frozen. Default behavior is to log the bucket's directory name and then delete it once it rolls. Archive indexed data

Use multiple partitions for index data

Splunk can use multiple disks and partitions for its index data. It's possible to configure Splunk to use many disks/partitions/filesystems on the basis of multiple indexes and bucket types, so long as you mount them correctly and point to them properly from indexes.conf. However, we recommend that you use a single high performance file system to hold your Splunk index data for the best experience.

If you do use multiple partitions, the most common way to arrange Splunk's index data is to keep the hot/warm buckets on the local machine, and to put the cold bucket on a separate array or disks (for longer term storage). You'll want to run your hot/warm buckets on a machine with with fast read/write partitions, since most searching will happen there. Cold buckets should be located on a reliable array of disks.

Configure multiple partitions

1. Set up partitions just as you'd normally set them up in any operating system.

2. Mount the disks/partitions.

3. Edit indexes.conf to point to the correct paths for the partitions. You set paths on a per-index basis, so you can also set separate partitions for different indexes. Each index has its own [<index>] stanza, where <index> is the name of the index. These are the settable path attributes:

Buckets and Splunk administration

When you're administering Splunk, it helps to understand how Splunk stores indexes across buckets. In particular, several admin activities require a good understanding of buckets:

For information on setting a retirement and archiving policy, see "Set a retirement and archiving policy". You can base the retirement policy on either size or age of data.

For information on how to archive your indexed data, see "Archive indexed data". For information on archive signing, see "Configure archive signing". To learn how to restore data from archive, read "Restore archived data".

To learn how to backup your data, read "Back up indexed data". This topic also discusses how to manually roll hot buckets to warm (so that you can then back them up). Also, see "Best practices for backing up" on the Community Wiki.

For information on setting limits on disk usage, see "Set limits on disk usage".

Troubleshoot your buckets

This section tells you how to deal with an assortment of bucket problems. We're starting small, but we'll add new issues as they arise.

Recover invalid hot buckets

A hot bucket becomes an invalid hot (invalid_hot_<ID>) bucket when Splunk detects that the metadata files (Sources.data, Hosts.data, SourceTypes.data) are corrupt or incorrect. Incorrect data usually signifies incorrect time ranges; it can also mean that event counts are incorrect.

Splunk ignores invalid hot buckets. Data does not get added to such buckets, and they cannot be searched. Invalid buckets also do not count when determining bucket limit values such as maxTotalDataSizeMB. This means that invalid buckets do not negatively affect the flow of data through the system, but it also means that they can result in disk storage that exceeds the configured maximum value.

To recover an invalid hot bucket, use the recover-metadata command:

1. Make backup copies of the metadata files, Sources.data, Hosts.data, SourceTypes.data.

2. Rebuild the metadata from the raw data information:

     ./splunk cmd recover-metadata path_to_your_hot_buckets/invalid_hot_<ID>

3. If successful, rename the bucket as it would normally be named.

Rebuild index-level bucket manifests

It is rare that you might have reason to rebuild index-level manifests, but if you need to, Splunk provides a few commands that do just that.

Caution: You should only use these commands if Splunk support directs you to. Do not rebuild the manifests on your own.

The two index-level manifest files are .bucketManifest and .metaManifest. The .bucketManifest file contains a list of all buckets in the index. You might need to rebuild this if, for example, you manually copy a bucket into an index. The .metaManifest file contains a list of buckets that have contributed to the index-level metadata file.

The following command rebuilds the .bucketManifest and .metaManifest files and all *.data files in the homePath for the main index only. It does not rebuild metadata for individual buckets.

% splunk _internal call /data/indexes/main/rebuild-metadata-and-manifests

If you only want to rebuild the .metaManifest and homePath/*.data files, use this command instead:

% splunk _internal call /data/indexes/main/rebuild-metadata

If you only want to rebuild the .bucketManifest file, use this command:

% splunk _internal call /data/indexes/main/rebuild-bucket-manifest

You can use the asterisk (*) wildcard to rebuild manifests for all indexes. For example:

% splunk _internal call /data/indexes/*/rebuild-metadata-and-manifests

For more information

For more information on buckets, see "indexes.conf" in this manual and "Understanding buckets" on the Community Wiki.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!