Aggregate functions
Aggregate functions summarize the values from each event to create a single, meaningful value. Common aggregate functions include Average, Count, Minimum, Maximum, Standard Deviation, Sum, and Variance.
Most aggregate functions are used with numeric fields. However, there are some functions that you can use with either alphabetic string fields or numeric fields. The function descriptions indicate which functions you can use with alphabetic strings.
For an overview, see statistical and charting functions.
avg(X)
Description
Returns the average of the values of field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
For a list of the related statistical and charting commands that you can use with this function, see Statistical and charting functions.
Basic examples
The following example returns the average (mean) "size" for each distinct "host".
... | stats avg(size) BY host
The following example returns the average "thruput" of each "host" for each 5 minute time span.
... | bin _time span=5m | stats avg(thruput) BY _time host
The following example charts the ratio of the average (mean) "size" to the maximum "delay" for each distinct "host" and "user" pair.
... | chart eval(avg(size)/max(delay)) AS ratio BY host user
The following example displays a timechart of the average of cpu_seconds by processor, rounded to 2 decimal points.
... | timechart eval(round(avg(cpu_seconds),2)) BY processor
Extended example
Chart the average number of events in a transaction, based on transaction duration.
This example uses the sample dataset from the Search Tutorial. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk deployment Then, run this search using the time range, All time. |
Run the following search to create a chart that shows the average number of events in a transaction based on the duration of the transaction.
sourcetype=access_* status=200 action=purchase | transaction clientip maxspan=30m | chart avg(eventcount) by duration span=log2
The transaction
command also creates a new field called eventcount
, which is the number of events in a single transaction.
The transactions are then piped into the chart
command and the avg()
function is used to calculate the average number of events for each duration. Because the duration is in seconds and you expect there to be many values, the search uses the span
argument to bucket the duration into bins of log2 (span=log2
). This produces the following table:
Click the Visualizations tab to format the report as a pie chart:
Each wedge of the pie chart represents the average number of events in the transactions of the corresponding duration. After you create the pie chart, you can hover over each of the sections to see these values.
count(X) or c(X)
Description
Returns the number of occurrences of the field X. To indicate a specific field value to match, format X as eval(field="value"). Processes field values as strings. To use this function, you can specify count(X)
, or the abbreviation c(X)
.
Usage
You can use the count(X)
function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
The following example returns the count of events where status
has the value "404". This example uses an eval expression with the count
function.
...count(eval(status="404")) AS count_status BY sourcetype
The following example separates search results into 10 bins and returns the count of raw events for each bin.
... | bin size bins=10 | stats count(_raw) BY size
The following example generates a sparkline chart to count the events that use the _raw
field.
... sparkline(count)
The following example generates a sparkline chart to count the events that have the user
field.
... sparkline(count(user))
Extended examples
These examples use the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk deployment. |
The following example uses the timechart
command to count the events where action=purchase
.
sourcetype=access_* | timechart count(eval(action="purchase")) by productName usenull=f useother=f
The following example uses the chart
command to determine the number of different page requests, GET and POST, that occurred for each Web server.
sourcetype=access_* | chart count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host
This example uses eval
expressions to specify the different field values for the stats
command to count. The first clause uses the count()
function to count the Web access events that contain the method
field value GET
. Then, it renames the field that represents these results to "GET" (this is what the "AS" is doing). The second clause does the same for POST events. The counts of both types of events are then separated by the Web server, indicated by the host
field, from which they appeared.
This returns the following table.
Click the Visualizations tab to format the report as a column chart. This chart displays the total count of events for each event type, GET or POST, based on the host
value.
distinct_count(X) or dc(X)
Description
Returns the count of distinct values of the field X. This function processes field values as strings. To use this function, you can specify distinct_count(X)
, or the abbreviation dc(X)
.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
The following example removes duplicate results with the same "host" value and return the total count of the remaining results.
... | stats dc(host)
The following example generates sparklines for the distinct count of devices and renames the field, "numdevices".
...sparkline(dc(device)) AS numdevices
The following example counts the distinct sources for each sourcetype, and buckets the count for each five minute spans.
...sparkline(dc(source),5m) BY sourcetype
Extended example
This example uses the sample dataset from the Search Tutorial. Download the data set from this topic in the tutorial and follow the instructions to upload it into the Splunk platform. Then, run this search using the time range, Yesterday. |
This example uses the dc()
, or distinct_count()
, function to count the number of different customers who purchased something from the Buttercup Games online store yesterday. You want to organize the count by the type of product (accessories, t-shirts, and type of games) that customers purchased.
sourcetype=access_* action=purchase | stats dc(clientip) BY categoryId
This example first searches for purchase events (action=purchase
). These results are piped into the stats
command and the dc()
function counts the number of different users who make purchases. The BY
clause is used to break up this number based on the different category of products, the categoryId
.
estdc(X)
Description
Returns the estimated count of the distinct values of the field X. This function processes field values as strings. The string values 1.0 and 1 are considered distinct values and counted separately.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
The following example removes duplicate results with the same "host" value and returns the estimated total count of the remaining results.
... | stats estdc(host)
The following example generates sparklines for the estimated distinct count of devices and renames the field, "numdevices".
...sparkline(dc(device)) AS numdevices
The following example estimates the distinct count for the sources for each sourcetype. The results are displayed for each five minute span in sparkline charts.
...sparkline(dc(source),5m) BY sourcetype
estdc_error(X)
Description
Returns the theoretical error of the estimated count of the distinct values of the field X. The error represents a ratio of the absolute_value(estimate_distinct_count - real_distinct_count)/real_distinct_count
. This function processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
The following example determines the error ratio for the estimated distinct count of the "host" values.
... | stats estdc_error(host)
exactperc<X>(Y)
Description
Returns a percentile value of the numeric field Y. See the perc<X>(Y)
function.
max(X)
Description
Returns the maximum value of the field X. If the values of X are non-numeric, the maximum value is found using lexicographical ordering.
Processes field values as numbers if possible, otherwise processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.
- Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
- Uppercase letters are sorted before lowercase letters.
- Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Basic examples
This example returns the maximum value of "size".
max(size)
Extended example
These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes in the past 30 days. |
Count the number of earthquakes that occurred for each magnitude range
Just to take a look at the data, calculate the number of earthquakes that occurred in each magnitude range. This data set was comprised of events over a 30-day period.
source=all_month.csv | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"
This search uses span=1
to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".
Calculate aggregate statistics for the magnitudes of earthquakes in an area
Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Then calculate the minimum, maximum, the range (difference between the min and max), and average magnitudes of those recent earthquakes and list them by magnitude type.
source=all_month.csv place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType
Use stats
functions for each of these calculations: count(), max(), min(), range(), and avg()
. This returns the following table:
mean(X)
Description
Returns the arithmetic mean of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
The following example returns the mean of "kbps" values:
...mean(kbps)
Extended example
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and add it as an input to your Splunk instance. |
The following example finds the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.
source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType
The mean
values should be exactly the same as the values calculated using avg()
.
median(X)
Description
Returns the middle-most value of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
If you have an even number of events, by default the median calculation is approximated to the higher of the two values. To receive a more accurate median value with an even number of events, change the perc_method
in the limits.conf
file.
Only users with file system access, such as system administrators, can edit the configuration files. Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make the changes in the local directory.
See How to edit a configuration file in the Admin manual.
In the [stats | sistats]
stanza, change the perc_method
setting to interpolated
.
If you are using Splunk Cloud and want to edit the configuration file, file a Support ticket.
Basic examples
min(X)
Description
Returns the minimum value of the field X. If the values of X are non-numeric, the minimum value is found using lexicographical ordering.
This function processes field values as numbers if possible, otherwise processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.
- Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
- Uppercase letters are sorted before lowercase letters.
- Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Basic examples
The following example returns the minimum size and maximum size of the HotBucketRoller component in the _internal index.
index=_internal component=HotBucketRoller | stats min(size), max(size)
The following example returns a list of processors and calculates the minimum cpu_seconds and the maximum cpu_seconds.
index=_internal | chart min(cpu_seconds), max(cpu_seconds) BY processor
mode(X)
Description
Returns the most frequent value of the field X.
Processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
perc<X>(Y)
Description
There are three different percentile functions:
- perc<X>(Y) (or the abbreviation
p<X>(Y)
) - upperperc<X>(Y)
- exactperc<X>(Y)
Returns the X-th percentile value of the numeric field Y. Valid values of X are integers from 1 to 99.
Use the perc<X>(Y) function to calculate an approximate threshold, such that of the values in field Y, X percent fall below the threshold.
The perc
and upperperc
functions give approximate values for the integer percentile requested. The approximation algorithm that is used, which is based on dynamic compression of a radix tree, provides a strict bound of the actual value for any percentile. The perc
function returns a single number that represents the lower end of that range. The upperperc
function gives the approximate upper bound. The exactperc
function provides the exact value, but will be very expensive for high cardinality fields. The exactperc
function could consume a large amount of memory in the search head.
Processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
For the list of values Y = {10,9,8,7,6,5,4,3,2,1}
:
perc50(Y)=6
perc95(Y)=10
range(X)
Description
Returns the difference between the max and min values of the field X ONLY IF the values of X are numeric.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
stdev(X)
Description
Returns the sample standard deviation of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
This example returns the standard deviation of wildcarded fields "*delay" which can apply to both, "delay" and "xdelay".
stdev(*delay)
Extended example
These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and add it as an input. |
The following example finds the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.
source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType
stdevp(X)
Description
Returns the population standard deviation of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
Extended example
These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and add it as an input. |
Count the number of earthquakes that occurred for each magnitude range
This data set was comprised of events over a 30-day period.
source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"
This search uses span=1
to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".
Find the mean, standard deviation, and variance of the magnitudes of recent earthquakes in California
source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType
Use stats
functions for each of these calculations: mean(), stdev(), and var()
. This returns the following table.
sum(X)
Description
Returns the sum of the values of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
sum(eval(date_hour * date_minute))
sumsq(X)
Description
Returns the sum of the squares of the values of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
upperperc<X>(Y)
Description
Returns a percentile value of the numeric field Y. See the perc<X>(Y)
function.
var(X)
Description
Returns the sample variance of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
varp(X)
Description
Returns the population variance of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
PREVIOUS Statistical and charting functions |
NEXT Event order functions |
This documentation applies to the following versions of Splunk^{®} Enterprise: 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 6.6.6, 7.0.0, 7.0.1, 7.0.2, 7.0.3
Comments
I’m curious if there’s a more in depth explanation of the interpolated perc_method setting in limits.conf for the median function. I’m not exactly sure what it is going to do to the calculation (ie: will it calculate the average of the middle two numbers when there is an even number of values or does it do something else?). I’m also curious if that setting changes calculations for any other function or evaluation.
Hello Cmerriman
Thank you for your question. The perc_method affects the median function and also all percentile calculations (percXX).
Interpolation means that Splunk computes a percentile based on the 2 closest exact percentiles that are available. Because you might have a bunch of data all bunched up at certain numbers, there isn’t necessarily a single specific number that best represents a particular percentile. In the interpolation method, Splunk chooses the 2 numbers that represent the nearest percentiles higher and lower than the requested percentile, and do a weighted average. For example, if we can say that the 90th percentile is 80 and the 95th percentile is 100, then interpolation means that the 93rd percentile is 92.
It’s basically what Excel does.
I'll look into clarifying this in the limits.conf file. Hope this helps!