Splunk Cloud Platform

Search Manual

Calculate sizes of dynamic fields

This search determines which fields in your events consume the most disk space, without any prior knowledge of field names and number of events.

Scenario

index=_internal earliest=-15m latest=now
| fieldsummary 
| rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\","
| mvexpand values 
| eval bytes=len(values)
| rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
| stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName
| rename count as NumOfValuesPerField
| eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
| eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2)
| eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2)
| eval ConsumedMB=SumOfBytesInField/1024/1024
| eval TotalMB=TotalBytes/1024/1024
| table FieldName NumOfValuesPerField SumOfBytesInField ConsumedMB PercentOfTotalBytes PercentOfTotalEvents
| addcoltotals labelfield=FieldName label=Totals
| sort - PercentOfTotalEvents

The results appear on the Statistics tab and look something like this:

FieldName NumValuesPerField SumOfBytesInField ConsumedMB PercentOfTotalBytes PercentOfTotalEvents
Totals 1802 45700 0.0436 99.87 100.29
cumulative_hits 100 587 0.0006 1.28 5.55
eps 100 1862 0.0018 4.07 5.55
kb 100 1159 0.0011 2.54 5.55
kbps 100 1881 0.0018 4.12 5.55
req_time 100 3000 0.0029 6.56 5.55
uri 100 10559 0.0101 23.11 5.55
uri_query 100 3532 0.0034 7.73 5.55
message 96 11633 0.0111 25.46 5.33
avg_age 76 280 0.0012 2.80 4.22
ev 62 140 0.0001 0.31 3.44
average_kbps 59 1071 0.0010 2.34 3.27

The totals do not add up to exactly 100% because this search is adding up the values after the values have been rounded. To avoid this, you can move the rounding to the end of the search string.

Walkthrough

Let's walk through each part of the search.

  1. The example begins with a search to retrieve all events in index=_internal within the last 15 minutes.
    index=_internal earliest=-15m latest=now
    


    Note: You can replace this with any search string and time range.

  2. Next, add the the fieldsummary command to create a summary of all the fields in the previously retrieved events.
    | fieldsummary
    


    The results appear on the Statistics tab and look something like this:

    field count distinct _count is _exact max mean min numeric _count stdev values
    abandoned _channels 29 1 1 0.0 0.00 0.0 29 0.00 [{"value":"0","count":29}]
    active 29 1 1 0.0 0.00 0.0 29 0.00 [{"value":"0","count":29}]
    active _hist _searches 31 2 1 1.0 0.13 0.0 31 0.34 [{"value":"0","count":27},

    {"value":"1","count":4}]

    average _kbps 87 59 1 0.3 0.21 0.0 87 0.15 [{"value":"0","count":29},

    {"value":"0.31239045073685034","count":1}, {"value":"0.31240549380412547","count":1}, {"value":"0.3124557194522294","count":1}, {"value":"0.3124707607545469","count":1}]

  3. The values of each field are extracted with a regex into a multivalue field called values, and then expanded. The length of each value is calculated in bytes.
    | rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\","
    | mvexpand values 
    | eval bytes=len(values)
    
  4. The values of the field are extracted with another regex, with some exceptions.
    | rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
    
  5. The stats command is used to perform multiple calculations using stats functions, including the count and the sum of the bytes (SumOfBytesInField). The values function is used to returns the list of all distinct values of the values field as a multivalue entry (Values). The max function calculates the maximum field length in bytes (MaxFieldLengthInBytes). The results are organized by field name.
    | stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName
    | rename count as NumOfValuesPerField
    
  6. The eventstats command is used to calculate several sums, the number of values in each field (TotalEvents) and the sum of the bytes in each field (Total Bytes).
    | eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
    
  7. Several eval commands are run to calculate the percentage of total events, the percentage of total bytes, the megabytes consumed, and the total megabytes.
    | eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2)
    | eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2)
    | eval ConsumedMB=SumOfBytesInField/1024/1024
    | eval TotalMB=TotalBytes/1024/1024
    
  8. The table command is used to display on a specific set of fields. The addfoltotals command is used to calculate the total for each column. The sort command is used sort the list in descending order by the PercentageOfTotalEvents field.
    | table FieldName NumberOfValuesPerField SumOfBytesInField ConsumedMB PercentageOfTotalBytes PercentageOfTotalEvents
    | addcoltotals labelfield=FieldName label=Totals
    | sort - PercentageOfTotalEvents
    

The results appear on the Statistics tab and look something like this:

FieldName NumValuesPerField SumOfBytesInField ConsumedMB PercentOfTotalBytes PercentOfTotalEvents
Totals 1802 45700 0.0436 99.87 100.29
cumulative_hits 100 587 0.0006 1.28 5.55
eps 100 1862 0.0018 4.07 5.55
kb 100 1159 0.0011 2.54 5.55
kbps 100 1881 0.0018 4.12 5.55
req_time 100 3000 0.0029 6.56 5.55
uri 100 10559 0.0101 23.11 5.55
uri_query 100 3532 0.0034 7.73 5.55
message 96 11633 0.0111 25.46 5.33
avg_age 76 280 0.0012 2.80 4.22
ev 62 140 0.0001 0.31 3.44
average_kbps 59 1071 0.0010 2.34 3.27
Last modified on 23 July, 2020
About writing custom search commands  

This documentation applies to the following versions of Splunk Cloud Platform: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters