
Calculate sizes of dynamic fields
This page is currently a work in progress; expect frequent near-term updates.
This search determines which fields in your events, without any prior knowledge of field names and number of events, consume the most disk space.
Scenario
index=_internal earliest=-15m latest=now | fieldsummary | rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\"," | mvexpand values | eval bytes=len(values) | rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)" | stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName | rename count as NumberOfValuesPerField | eventstats sum(NumberOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes | eval PercentageOfTotalEvents=round(NumberOfValuesPerField/TotalEvents*100,2) | eval PercentageOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2) | eval ConsumedMB=SumOfBytesInField/1024/1024 | eval TotalMB=TotalBytes/1024/1024 | table FieldName NumberOfValuesPerField SumOfBytesInField ConsumedMB PercentageOfTotalBytes PercentageOfTotalEvents | addcoltotals labelfield=FieldName label=Totals | sort - PercentageOfTotalEvents
Walkthough
1. The example begins with a search to retrieve all events in index=_internal
within the last 15 minutes.
index=_internal earliest=-15m latest=now
Note: You can replace this with any search string and timerange.
2. Next, the fieldsummary command creates a summary of all the fields in previously retrieved events.
... | fieldsummary
This looks something like this:
3. The values of each field are extracted with a regex into a multivalue field, values, and then expanded. The length of each value is calculated in bytes.
| rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\"," | mvexpand values | eval bytes=len(values)
4. The values of the field are extracted with another regex, with some exceptions.
| rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
5.
| stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName | rename count as NumberOfValuesPerField
6.
| eventstats sum(NumberOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
7.
| eval PercentageOfTotalEvents=round(NumberOfValuesPerField/TotalEvents*100,2) | eval PercentageOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2) | eval ConsumedMB=SumOfBytesInField/1024/1024 | eval TotalMB=TotalBytes/1024/1024
8.
| table FieldName NumberOfValuesPerField SumOfBytesInField ConsumedMB PercentageOfTotalBytes PercentageOfTotalEvents | addcoltotals labelfield=FieldName label=Totals | sort - PercentageOfTotalEvents
PREVIOUS What's in this chapter? |
This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14
Feedback submitted, thanks!