Calculate sizes of dynamic fields
This search determines which fields in your events consume the most disk space, without any prior knowledge of field names and number of events.
Scenario
index=_internal earliest=-15m latest=now | fieldsummary | rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\"," | mvexpand values | eval bytes=len(values) | rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)" | stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName | rename count as NumOfValuesPerField | eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes | eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2) | eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2) | eval ConsumedMB=SumOfBytesInField/1024/1024 | eval TotalMB=TotalBytes/1024/1024 | table FieldName NumOfValuesPerField SumOfBytesInField ConsumedMB PercentOfTotalBytes PercentOfTotalEvents | addcoltotals labelfield=FieldName label=Totals | sort - PercentOfTotalEvents
The results appear on the Statistics tab and look something like this:
FieldName | NumValuesPerField | SumOfBytesInField | ConsumedMB | PercentOfTotalBytes | PercentOfTotalEvents |
---|---|---|---|---|---|
Totals | 1802 | 45700 | 0.0436 | 99.87 | 100.29 |
cumulative_hits | 100 | 587 | 0.0006 | 1.28 | 5.55 |
eps | 100 | 1862 | 0.0018 | 4.07 | 5.55 |
kb | 100 | 1159 | 0.0011 | 2.54 | 5.55 |
kbps | 100 | 1881 | 0.0018 | 4.12 | 5.55 |
req_time | 100 | 3000 | 0.0029 | 6.56 | 5.55 |
uri | 100 | 10559 | 0.0101 | 23.11 | 5.55 |
uri_query | 100 | 3532 | 0.0034 | 7.73 | 5.55 |
message | 96 | 11633 | 0.0111 | 25.46 | 5.33 |
avg_age | 76 | 280 | 0.0012 | 2.80 | 4.22 |
ev | 62 | 140 | 0.0001 | 0.31 | 3.44 |
average_kbps | 59 | 1071 | 0.0010 | 2.34 | 3.27 |
The totals do not add up to exactly 100% because this search is adding up the values after the values have been rounded. To avoid this, you can move the rounding to the end of the search string.
Walkthrough
Let's walk through each part of the search.
- The example begins with a search to retrieve all events in
index=_internal
within the last 15 minutes.index=_internal earliest=-15m latest=now
Note: You can replace this with any search string and time range. - Next, add the the fieldsummary command to create a summary of all the fields in the previously retrieved events.
| fieldsummary
The results appear on the Statistics tab and look something like this:
field count distinct _count is _exact max mean min numeric _count stdev values abandoned _channels 29 1 1 0.0 0.00 0.0 29 0.00 [{"value":"0","count":29}] active 29 1 1 0.0 0.00 0.0 29 0.00 [{"value":"0","count":29}] active _hist _searches 31 2 1 1.0 0.13 0.0 31 0.34 [{"value":"0","count":27}, {"value":"1","count":4}]
average _kbps 87 59 1 0.3 0.21 0.0 87 0.15 [{"value":"0","count":29}, {"value":"0.31239045073685034","count":1}, {"value":"0.31240549380412547","count":1}, {"value":"0.3124557194522294","count":1}, {"value":"0.3124707607545469","count":1}]
- The values of each field are extracted with a regex into a multivalue field called values, and then expanded. The length of each value is calculated in bytes.
| rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\"," | mvexpand values | eval bytes=len(values)
- The values of the field are extracted with another regex, with some exceptions.
| rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
- The
stats
command is used to perform multiple calculations using stats functions, including the count and the sum of the bytes (SumOfBytesInField). The values function is used to returns the list of all distinct values of thevalues
field as a multivalue entry (Values). The max function calculates the maximum field length in bytes (MaxFieldLengthInBytes). The results are organized by field name.
| stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName | rename count as NumOfValuesPerField
- The
eventstats
command is used to calculate several sums, the number of values in each field (TotalEvents) and the sum of the bytes in each field (Total Bytes).
| eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
- Several
eval
commands are run to calculate the percentage of total events, the percentage of total bytes, the megabytes consumed, and the total megabytes.
| eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2) | eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2) | eval ConsumedMB=SumOfBytesInField/1024/1024 | eval TotalMB=TotalBytes/1024/1024
- The
table
command is used to display on a specific set of fields. Theaddfoltotals
command is used to calculate the total for each column. Thesort
command is used sort the list in descending order by the PercentageOfTotalEvents field.
| table FieldName NumberOfValuesPerField SumOfBytesInField ConsumedMB PercentageOfTotalBytes PercentageOfTotalEvents | addcoltotals labelfield=FieldName label=Totals | sort - PercentageOfTotalEvents
The results appear on the Statistics tab and look something like this:
FieldName | NumValuesPerField | SumOfBytesInField | ConsumedMB | PercentOfTotalBytes | PercentOfTotalEvents |
---|---|---|---|---|---|
Totals | 1802 | 45700 | 0.0436 | 99.87 | 100.29 |
cumulative_hits | 100 | 587 | 0.0006 | 1.28 | 5.55 |
eps | 100 | 1862 | 0.0018 | 4.07 | 5.55 |
kb | 100 | 1159 | 0.0011 | 2.54 | 5.55 |
kbps | 100 | 1881 | 0.0018 | 4.12 | 5.55 |
req_time | 100 | 3000 | 0.0029 | 6.56 | 5.55 |
uri | 100 | 10559 | 0.0101 | 23.11 | 5.55 |
uri_query | 100 | 3532 | 0.0034 | 7.73 | 5.55 |
message | 96 | 11633 | 0.0111 | 25.46 | 5.33 |
avg_age | 76 | 280 | 0.0012 | 2.80 | 4.22 |
ev | 62 | 140 | 0.0001 | 0.31 | 3.44 |
average_kbps | 59 | 1071 | 0.0010 | 2.34 | 3.27 |
About writing custom search commands |
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!