Estimating your storage requirements
Estimating your storage requirements
This topic describes how to estimate the size of your Splunk index on disk and associated data so that you can plan your storage capacity requirements.
When Splunk indexes your data, the resulting data falls into two basic categories: the compressed raw data that is persisted and the indexes that point to this data. With a little experimentation, you can estimate how much disk space you will need.
Typically, the compressed, persisted data that Splunk extracts from your data inputs amounts to approximately 10% of the raw data that comes into Splunk. The indexes that are created to access this data can be anywhere from 10% to 110% of the size of the data that comes in. This value is affected strongly by how many unique terms occur in your data. Depending on the characteristics of your data, you might want to tune your segmentation settings later on.
The best way to get an idea of your index size is to experiment by installing a test copy of Splunk somewhere and indexing a representative sample of your data, and then checking the sizes of the resulting directories defaultdb.
On *nix systems, follow these steps
Once you've indexed your sample:
1. Go to $SPLUNK_HOME/var/lib/splunk/defaultdb/db.
2. Run du -shc hot_v*/rawdata to determine how large the compressed persisted raw data is.
3. Run du -ch hot_v* and look at the last total line to see the size of the index.
This is the persisted data to which the items in the index point. Typically, this file's size is about 10% of the size of the sample data set you indexed.
4. Add the values you get together.
On Windows systems, follow these steps
1. Download the du utility from Microsoft TechNet.
2. Extract du.exe from the downloaded ZIP file and place it into your %SYSTEMROOT% folder.
Note: You can also place it anywhere in your %PATH%.
3. Open a command prompt.
4. Once there, go to %SPLUNK_HOME%\var\lib\splunk\defaultdb\db.
5. Run del %TEMP%\du.txt & for /d %i in (hot_v*) do du -q -u %i\rawdata | findstr /b "Size:" >> %TEMP%\du.txt.
6. Open the %TEMP%\du.txt file. You will see <code>Size: n, which is the size of each rawdata directory found.
7. Add these numbers together to find out how large the compressed persisted raw data is.
8. Next, run for /d %i in (hot_v*) do dir /s %i, the summary of which is the size of the index.
9. Add this number to the total persistent raw data number.
This is the total size of the index and associated data for the sample you have indexed. You can now use this to extrapolate the size requirements of your Splunk index and rawdata directories over time.
Answers
Have questions? Visit Splunk Answers to see what questions and answers other Splunk users had about data sizing.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 View the Article History for its revisions.