Splunk® Enterprise

Installation Manual

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Estimate your storage requirements

This topic describes how to estimate the size of your Splunk index, so that you can plan your storage capacity requirements.

When Splunk indexes your data, it creates two main types of files: the rawdata file containing the original data in compressed form and the index files that point to this data. (It also creates a few metadata files, which don't consume much space.) With a little experimentation, you can estimate how much index disk space you will need for a given amount of incoming data.

Typically, the compressed rawdata file is approximately 10% the size of the incoming, pre-indexed raw data. The associated index files range in size from approximately 10% to 110% of the rawdata file. This value is affected strongly by the number of unique terms in the data. Depending on the data's characteristics, you might want to tune your segmentation settings, as described in "About segmentation".

The best way to get an idea of your space needs is to experiment by indexing a representative sample of your data, and then checking the sizes of the resulting directories in defaultdb.

On *nix systems, follow these steps

Once you've indexed your sample:

1. Go to $SPLUNK_HOME/var/lib/splunk/defaultdb/db.

2. Run du -shc hot_v*/rawdata to determine how large the compressed persisted raw data is.

This is the persisted data to which the items in the index point. Typically, this file's size is about 10% of the size of the sample data set you indexed.

3. Run du -ch hot_v* and look at the last total line to see the size of the index.

4. Add the values you get together.

On Windows systems, follow these steps

1. Download the du utility from Microsoft TechNet.

2. Extract du.exe from the downloaded ZIP file and place it into your %SYSTEMROOT% folder.

Note: You can also place it anywhere in your %PATH%.

3. Open a command prompt.

4. Once there, go to %SPLUNK_HOME%\var\lib\splunk\defaultdb\db.

5. Run del %TEMP%\du.txt & for /d %i in (hot_v*) do du -q -u %i\rawdata | findstr /b "Size:" >> %TEMP%\du.txt.

6. Open the %TEMP%\du.txt file. You will see Size: n, which is the size of each rawdata directory found.

7. Add these numbers together to find out how large the compressed persisted raw data is.

8. Next, run for /d %i in (hot_v*) do dir /s %i, the summary of which is the size of the index.

9. Add this number to the total persistent raw data number.

This is the total size of the index and associated data for the sample you have indexed. You can now use this to extrapolate the size requirements of your Splunk index and rawdata directories over time.


Have questions? Visit Splunk Answers to see what questions and answers other Splunk users had about data sizing.

High availability reference architecture
Splunk architecture and processes

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters