Use the stats command and functions
This topic discusses how to use the statistical functions with the transforming commands chart
, timechart
, stats
, eventstats
, and streamstats
.
- For more information about the stat command and syntax, see the "stats" command in the Search Reference.
- For the list of stats functions, see "Statistical and charting functions" in the Search Reference.
About the stats commands and functions
The stats
, streamstats
, and eventstats
commands each enable you to calculate summary statistics on the results of a search or the events retrieved from an index. The stats
command works on the search results as a whole. The streamstats
command calculates statistics for each event at the time the event is seen, in a streaming manner. The eventstats
command calculates statistics on all search results and adds the aggregation inline to each event for which it is relevant. See more about the differences between these commands in the next section.
The chart command
returns your results in a data structure that supports visualization as a chart (such as a column, line, area, and pie chart). You can decide what field is tracked on the x-axis of the chart. The timechart command
returns your results formatted as a time-series chart, where your data is plotted against an x-axis that is always a time field. Read more about visualization features and options in the Visualization Reference of the Data Visualization Manual.
The stats
, chart
, and timechart
commands (and their related commands eventstats
and streamstats
) are designed to work in conjunction with statistical functions. The list of statistical functions lets you count the occurrence of a field and calculate sums, averages, ranges, and so on, of the field values.
For the list of statistical functions and how they're used, see "Statistical and charting functions" in the Search Reference.
Stats, eventstats, and streamstats
The eventstats
and streamstats
commands are variations on the stats
command.
The stats
command works on the search results as a whole and returns only the fields that you specify. For example, the following search returns a table with two columns (and 10 rows).
sourcetype=access_* | head 10 | stats sum(bytes) as ASumOfBytes by clientip
The ASumOfBytes
and clientip
fields are the only fields that exist after the stats command. For example, the following search returns empty cells in the bytes
column because it is not a result field.
sourcetype=access_* | head 10 | stats sum(bytes) as ASumOfBytes by clientip | table bytes, ASumOfBytes, clientip
To see more fields other than ASumOfBytes
and clientip
in the results, you need to include them in the stats command. Also, if you want to perform calculations on any of the original fields in your raw events, you need to do that before the stats command.
The eventstats
command computes the same statistics as the stats
command, but it also aggregates the results to the original raw data. When you run the following search, it returns an events list instead of a results table, because the eventstats command does not change the raw data.
sourcetype=access_* | head 10 | eventstats sum(bytes) as ASumOfBytes by clientip
You can use the table
command to format the results as a table that displays the fields you want. Now, you can also view the values of bytes
(or any of the original fields in your raw events) in your results.
sourcetype=access_* | head 10 | eventstats sum(bytes) as ASumOfBytes by clientip | table bytes, ASumOfBytes, clientip
The streamstats
command also aggregates the calculated statistics to the original raw event, but it does this at the time the event is seen. To demonstrate this, include the _time
field in the earlier search and use streamstats
.
sourcetype=access_* | head 10 | sort _time | streamstats sum(bytes) as ASumOfBytes by clientip | table _time, clientip, bytes, ASumOfBytes
Instead of a total sum for each clientip
(as returned by stats
and eventstats
), this search calculates a sum for each event based on the time that it is seen. The streamstats
command is useful for reporting on events at a known time range.
Examples
Example 1
This example creates a chart of how many new users go online each hour of the day.
... | sort _time | streamstats dc(userid) as dcusers | delta dcusers as deltadcusers | timechart sum(deltadcusers)
The dc
(or distinct_count
) function returns a count of the unique values of userid
and renames the resulting field dcusers
.
If you don't rename the function, for example "dc(userid) as dcusers", the resulting calculation is automatically saved to the function call, such as "dc(userid)".
The delta
command is used to find the difference between the current and previous dcusers
value. Then, the sum of this delta is charted over time.
Example 2
This example calculates the median for a field, then charts the count of events where the field has a value less than the median.
... | eventstats median(bytes) as medbytes | eval snap=if(bytes>=medbytes, bytes, "smaller") | timechart count by snap
Eventstats is used to calculate the median for all the values of bytes from the previous search.
Example 3
This example calculates the standard deviation and variance of calculated fields.
sourcetype=log4j ERROR earliest=-7d@d latest=@d | eval warns=errorGroup+"-"+errorNum | stats count as Date_Warns_Count by date_mday,warns | stats stdev(Date_Warns_Count), var(Date_Warns_Count) by warns
This search returns errors from the last 7 days and creates the new field, warns, from extracted fields errorGroup and errorNum. The stats command is used twice. First, it calculates the daily count of warns for each day. Then, it calculates the standard deviation and variance of that count per warns.
Example 4
You can use the calculated fields as filter parameters for your search.
sourcetype=access_* | eval URILen = len(useragent) | eventstats avg(URILen) as AvgURILen, stdev(URILen) as StdDevURILen| where URILen > AvgURILen+(2*StdDevURILen) | chart count by URILen span=10 cont=true
In this example, eventstats is used to calculate the average and standard deviation of the URI lengths from useragent
. Then, these numbers are used as filters for the retrieved events.
About calculating statistics | Use stats with eval expressions and functions |
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!