Estimate your storage requirements

When ingesting data into Splunk Enterprise, the indexing process creates a number of files on disk. The rawdata file contains the source data as events, stored in a compressed form. The index or TSIDX files contain terms from the source data that point back to events in the rawdata file. Typically, the rawdata file is 15% the size of the pre-indexed data, and the TSIDX files are approximately 35% of the size of the pre-indexed data. When you combine the two file sizes, the rawdata and TSIDX represent approximately 50% of pre-indexed data volume.

The guidance for allocating disk space is to use your estimated license capacity (data volume per day) with a 50% compression estimate. The compression estimates for data sources vary based upon the structure of the data and the fields in the data. Most customers will ingest a variety of data sources and see an equally wide range of compression numbers, but the aggregate compression used to estimate storage is still 50% compression.

For example, to keep 30 days of data in a storage volume at 100GB/day in data ingest, plan to allocate at least (100*30/2) 1.5TB of free space. If you have multiple indexers, you will divide the free space required between all indexers equally. For example, if you have 2 indexers, each indexer needs (100*30/2)/2 750GB of free storage space. The calculation example does not include extra space for OS disk space checks, minimum space thresholds set in other software, or any other considerations outside of Splunk Enterprise.

Planning the index storage

Planning for index storage capacity is based upon the data volume per day, the data retention settings, the number of indexers, and which features of Splunk Enterprise you are using:

You have the data volume per day estimate used to calculate your license volume.
You know how long you need to keep your data.
You have an estimate of how many indexers you need.
(Optional) You know which data is most valuable to you, and you know how long that data is valuable for.
(Optional) You know that some data has historical value, but might not need to be searched as often or as quickly.
(Optional) You have an audit requirement to keep a copy of some data for a period of time, but you plan to restore the data before searching it.
(Optional) You have verified how well your data compresses. See Use a data sample to calculate compression.
(Optional) You plan to implement an index cluster. An index cluster requires additional disk space calculations to support data availability. See Storage requirement examples in the Managing Indexers and Clusters of Indexers manual.
(Optional) You plan to implement SmartStore remote storage. See About SmartStore in the Managing Indexers and Clusters of Indexers manual.
(Optional) You plan to implement the Enterprise Security app. See Data model acceleration storage and retention in the Enterprise Security Installation and Upgrade Manual.

Splunk Enterprise offers configurable storage tiers that allow you to use different storage technologies to support both fast searching and long-term retention. See How data ages in the Managing Indexers and Clusters of Indexers manual.

Use a data sample to calculate compression

Use sample data and your operating system tools to calculate the compression of a data source.

For *nix systems

On *nix systems, follow these steps:

Select a data source sample and note its size on disk.
Index your data sample using a file monitor or one-shot
On the command line, go to $SPLUNK_HOME/var/lib/splunk/defaultdb/db.
Run du -ch hot_v* and look at the last total line to see the size of the index.
Compare the sample size on disk to the indexed size.

For Windows systems

On Windows systems, follow these steps:

Download the du utility from Microsoft TechNet.
Extract du.exe from the downloaded ZIP file and place it into your %SYSTEMROOT% or %WINDIR% folder. You can also place du.exe anywhere in your %PATH%.
Select a data source sample and note its size on disk.
Index your data sample using a file monitor or one-shot
Open a command prompt and go to %SPLUNK_HOME%\var\lib\splunk\defaultdb\db.
Run del %TEMP%\du.txt & for /d %i in (hot_v*) do du -q -u %i\rawdata | findstr /b "Size:" >> %TEMP%\du.txt.
Open the %TEMP%\du.txt file. You will see Size: n, which is the size of each rawdata directory found.
Add these numbers together to find out how large the compressed persisted raw data is.
Run for /d %i in (hot_v*) do dir /s %i, the summary of which is the size of the index.
Add this number to the total persistent raw data number.

This is the total size of the index and associated data for the sample you have indexed. You can now use this to extrapolate the size requirements of your Splunk Enterprise index and rawdata directories over time.

Answers

Have questions? Visit Splunk Community to search for questions and answers that other Splunk users have shared about data sizing.

Related answers from Splunk Community

Estimate your storage requirements

Planning the index storage

Use a data sample to calculate compression

For *nix systems

For Windows systems

Answers

Comments

Estimate your storage requirements

Was this topic useful?