Aggregate functions
Aggregate functions summarize the values from each event to create a single, meaningful value. Common aggregate functions include Average, Count, Minimum, Maximum, Standard Deviation, Sum, and Variance.
Most aggregate functions are used with numeric fields. However, there are some functions that you can use with either alphabetic string fields or numeric fields. The function descriptions indicate which functions you can use with alphabetic strings.
For an overview, see statistical and charting functions.
avg(X)
Description
Returns the average of the values of field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
For a list of the related statistical and charting commands that you can use with this function, see Statistical and charting functions.
Basic examples
The following example returns the average (mean) "size" for each distinct "host".
...  stats avg(size) BY host
The following example returns the average "thruput" of each "host" for each 5 minute time span.
...  bin _time span=5m  stats avg(thruput) BY _time host
The following example charts the ratio of the average (mean) "size" to the maximum "delay" for each distinct "host" and "user" pair.
...  chart eval(avg(size)/max(delay)) AS ratio BY host user
The following example displays a timechart of the average of cpu_seconds by processor, rounded to 2 decimal points.
...  timechart eval(round(avg(cpu_seconds),2)) BY processor
Extended example
Chart the average number of events in a transaction, based on transaction duration.
This example uses the sample data from the Search Tutorial. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range All time when you run the search. 

Run the following search to create a chart to show the average number of events in a transaction based on the duration of the transaction.
sourcetype=access_* status=200 action=purchase  transaction clientip maxspan=30m  chart avg(eventcount) by duration span=log2
The
transaction
command adds two fields to the resultsduration
andeventcount
. Theeventcount
field tracks the number of events in a single transaction.In this search, the transactions are piped into the
chart
command. Theavg()
function is used to calculate the average number of events for each duration. Because the duration is in seconds and you expect there to be many values, the search uses thespan
argument to bucket the duration into bins using logarithm with a base of 2.  Use the field format option to enable number formatting.
 Click the Visualization tab and change the display to a pie chart.
Each wedge of the pie chart represents a duration for the event transactions. You can hover over a wedge to see the average values.
count(X) or c(X)
Description
Returns the number of occurrences of the field X. To indicate a specific field value to match, format X as eval(field="value"). Processes field values as strings. To use this function, you can specify count(X)
, or the abbreviation c(X)
.
Usage
You can use the count(X)
function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
The following example returns the count of events where the status
field has the value "404". This example uses an eval expression with the count
function.
... stats count(eval(status="404")) AS count_status BY sourcetype
The following example separates search results into 10 bins and returns the count of raw events for each bin.
...  bin size bins=10  stats count(_raw) BY size
The following example generates a sparkline chart to count the events that use the _raw
field.
... sparkline(count)
The following example generates a sparkline chart to count the events that have the user
field.
... sparkline(count(user))
The following example uses the timechart
command to count the events where the action
field contains the value purchase
.
sourcetype=access_*  timechart count(eval(action="purchase")) BY productName usenull=f useother=f
Extended examples
Count the number of earthquakes that occurred for each magnitude range
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes data from the past 30 days. 
 Run the following search to calculate the number of earthquakes that occurred in each magnitude range. This data set is comprised of events over a 30day period.
source=all_month.csv  chart count AS "Number of Earthquakes" BY mag span=1  rename mag AS "Magnitude Range"
 This search uses
span=1
to define each of the ranges for the magnitude field,mag
.  The rename command is then used to rename the field to "Magnitude Range".
The results appear on the Statistics tab and look something like this:Magnitude Range Number of Earthquakes 10 18 01 2088 12 3005 23 1026 34 194 45 452 54 109 67 11 78 3  This search uses
Count the number of different page requests for each Web server
This example uses the sample data from the Search Tutorial but should work with any format of Apache web access log. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range All time when you run the search. 

Run the following search to use the
chart
command to determine the number of different page requests, GET and POST, that occurred for each Web server.sourcetype=access_*  chart count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host
This example uses
eval
expressions to specify the different field values for thestats
command to count. The first clause uses thecount()
function to count the Web access events that contain themethod
field valueGET
. Then, using the AS keyword, the field that represents these results is renamed GET.The second clause does the same for POST events. The counts of both types of events are then separated by the web server, using the BY clause with the
host
field.The results appear on the Statistics tab and look something like this:
host GET POST www1 8431 5197 www2 8097 4815 www3 8338 4654 
Click the Visualization tab. If necessary, format the results as a column chart. This chart displays the total count of events for each event type, GET or POST, based on the
host
value.
distinct_count(X) or dc(X)
Description
Returns the count of distinct values of the field X. This function processes field values as strings. To use this function, you can specify distinct_count(X)
, or the abbreviation dc(X)
.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
The following example removes duplicate results with the same "host" value and return the total count of the remaining results.
...  stats dc(host)
The following example generates sparklines for the distinct count of devices and renames the field, "numdevices".
...sparkline(dc(device)) AS numdevices
The following example counts the distinct sources for each sourcetype, and buckets the count for each five minute spans.
...sparkline(dc(source),5m) BY sourcetype
Extended example
This example uses the sample data from the Search Tutorial. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range Yesterday when you run the search. 
 Run the following search to count the number of different customers who purchased something from the Buttercup Games online store yesterday. The search organizes the count by the type of product (accessories, tshirts, and type of games) that customers purchased.
sourcetype=access_* action=purchase  stats dc(clientip) BY categoryId
 This example first searches for purchase events,
action=purchase
.  These results are piped into the
stats
command and thedc()
function counts the number of different users who make purchases.  The
BY
clause is used to break up this number based on the different category of products, thecategoryId
.
The results appear on the Statistics tab and look something like this:categoryId dc(clientip) ACCESSORIES 37 ARCADE 58 NULL 8 SHOOTER 31 SIMULATION 34 SPORTS 13 STRATEGY 74 TEE 38  This example first searches for purchase events,
estdc(X)
Description
Returns the estimated count of the distinct values of the field X. This function processes field values as strings. The string values 1.0 and 1 are considered distinct values and counted separately.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
The following example removes duplicate results with the same "host" value and returns the estimated total count of the remaining results.
...  stats estdc(host)
The following example generates sparklines for the estimated distinct count of devices and renames the field, "numdevices".
...sparkline(estdc(device)) AS numdevices
The following example estimates the distinct count for the sources for each sourcetype. The results are displayed for each five minute span in sparkline charts.
...sparkline(estdc(source),5m) BY sourcetype
estdc_error(X)
Description
Returns the theoretical error of the estimated count of the distinct values of the field X. The error represents a ratio of the absolute_value(estimate_distinct_count  real_distinct_count)/real_distinct_count
. This function processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
The following example determines the error ratio for the estimated distinct count of the "host" values.
...  stats estdc_error(host)
exactperc<X>(Y)
Description
Returns a percentile value of the numeric field Y. See the perc<X>(Y)
function.
max(X)
Description
Returns the maximum value of the field X. If the values of X are nonnumeric, the maximum value is found using lexicographical ordering.
Processes field values as numbers if possible, otherwise processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF8 encoding, which is a superset of ASCII.
 Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
 Uppercase letters are sorted before lowercase letters.
 Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Basic examples
This example returns the maximum value of "size".
...  max(size)
Extended example
Calculate aggregate statistics for the magnitudes of earthquakes in an area
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes data from the past 30 days. 
 Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Use statistical functions to calculate the minimum, maximum, range (the difference between the min and max), and average magnitudes of the recent earthquakes. List the values by magnitude type.
source=all_month.csv place=*California*  stats count, max(mag), min(mag), range(mag), avg(mag) BY magType
The results appear on the Statistics tab and look something like this:
magType count max(mag) min(mag) range(mag) avg(mag) H 123 2.8 0.0 2.8 0.549593 MbLg 1 0 0 0 0.0000000 Md 1565 3.2 0.1 3.1 1.056486 Me 2 2.0 1.6 .04 1.800000 Ml 1202 4.3 0.4 4.7 1.226622 Mw 6 4.9 3.0 1.9 3.650000 ml 10 1.56 0.19 1.37 0.934000
mean(X)
Description
Returns the arithmetic mean of the field X.
The mean
values should be exactly the same as the values calculated using the avg()
function.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
The following example returns the mean of "kbps" values:
...  stats mean(kbps)
Extended example
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes data from the past 30 days. 

Run the following search to find the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.
source=usgs place=*California*  stats count mean(mag), stdev(mag), var(mag) BY magType
The results appear on the Statistics tab and look something like this:
magType count mean(mag) std(mag) var(mag) H 123 0.549593 0.356985 0.127438 MbLg 1 0.000000 0.000000 0.000000 Md 1565 1.056486 0.580042 0.336449 Me 2 1.800000 0.346410 0.120000 Ml 1202 1.226622 0.629664 0.396476 Mw 6 3.650000 0.716240 0.513000 ml 10 0.934000 0.560401 0.314049
median(X)
Description
Returns the middlemost value of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
If you have an even number of events, by default the median calculation is approximated to the higher of the two values. To receive a more accurate median value with an even number of events, change the perc_method
in the limits.conf
file.
 See How to edit a configuration file in the Admin manual.
Only users with file system access, such as system administrators, can edit the configuration files. Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make the changes in the local directory.
 In the
[stats  sistats]
stanza, change theperc_method
setting tointerpolated
.
If you are using Splunk Cloud and want to edit the configuration file, file a Support ticket.
Basic examples
min(X)
Description
Returns the minimum value of the field X. If the values of X are nonnumeric, the minimum value is found using lexicographical ordering.
This function processes field values as numbers if possible, otherwise processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF8 encoding, which is a superset of ASCII.
 Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
 Uppercase letters are sorted before lowercase letters.
 Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Basic examples
The following example returns the minimum size and maximum size of the HotBucketRoller component in the _internal index.
index=_internal component=HotBucketRoller  stats min(size), max(size)
The following example returns a list of processors and calculates the minimum cpu_seconds and the maximum cpu_seconds.
index=_internal  chart min(cpu_seconds), max(cpu_seconds) BY processor
Extended example
See the Extended example for the max()
function. That example includes the min()
function.
mode(X)
Description
Returns the most frequent value of the field X.
Processes field values as strings.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Basic examples
The mode
returns the most frequent value. Consider the following data:
firstname  surname  age 

Claudia  Garcia  32 
David  Mayer  45 
Alex  Garcia  29 
Wei  Zhang  45 
Javier  Garcia  37 
When you search for the mode in the age
field, the value 45
is returned.
... stats mode(age)
You can also use mode with fields that contain string values. When you search for the mode in the surname
field, the value Garcia
is returned.
... stats mode(surname)
perc<X>(Y)
Description
There are three different percentile functions:
 perc<X>(Y) (or the abbreviation
p<X>(Y)
)  upperperc<X>(Y)
 exactperc<X>(Y)
Return the Xth percentile value of the numeric field Y. Valid values of X are floating point numbers between 0 and 100, such as 99.95.
The percentile functions process field values as strings.
Use the perc<X>(Y) function to calculate an approximate threshold, such that of the values in field Y, X percent fall below the threshold.
The perc
and upperperc
functions give approximate values for the integer percentile requested. The approximation algorithm that is used, which is based on dynamic compression of a radix tree, provides a strict bound of the actual value for any percentile.
 The
perc
function returns a single number that represents the lower end of that range.  The
upperperc
function gives the approximate upper bound.  The
exactperc
function provides the exact value, but will be very expensive for high cardinality fields. Theexactperc
function could consume a large amount of memory in the search head.
Usage
You can use this function with the chart
, stats
, and timechart
commands.
Differences between Splunk and Excel percentile algorithms
If there are less than 1000 distinct values, the Splunk percentile functions use the nearest rank algorithm. See http://en.wikipedia.org/wiki/Percentile#Nearest_rank. Excel uses the NIST interpolated algorithm, which basically means you can get a value for a percentile that does not exist in the actual data, which is not possible for the nearest rank approach. You can specify that the Excel method should be used by changing the settings in the [stats] stanza in the limits.conf
file. Change the perc_method
setting to interpolated
instead of nearestrank
.
If there are more than 1000 distinct values for the field, the percentiles are approximated using a custom radixtree digestbased algorithm. This algorithm is much faster and uses much less memory, a constant amount, than an exact computation, which uses memory in linear relation to the number of distinct values. By default this approach limits the approximation error to < 1% of rank error. That means if you ask for 95th percentile, the number you get back is between the 94th and 96th percentile.
You always get the exact percentiles even for more than 1000 distinct values by using the exactperc
function compared to the perc
.
Basic examples
Consider this list of values Y = {10,9,8,7,6,5,4,3,2,1}
.
The following example returns 5.5.
... stats perc50(Y)
The following example returns 9.55.
... stats perc95(Y)
range(X)
Description
Returns the difference between the max and min values of the field X. The values of field X must be numeric.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic example
This example uses events that list the numeric sales for each product and quarter, for example:
products  quarter  sales  quota 

ProductA  QTR1  1200  1000 
ProductB  QTR1  1400  1550 
ProductC  QTR1  1650  1275 
ProductA  QTR2  1425  1300 
ProductB  QTR2  1175  1425 
ProductC  QTR2  1550  1450 
ProductA  QTR3  1300  1400 
ProductB  QTR3  1250  1125 
ProductC  QTR3  1375  1475 
ProductA  QTR4  1550  1300 
ProductB  QTR4  1700  1225 
ProductC  QTR4  1625  1350 
It is easiest to understand the range
if you also determine the min
and max
values. To determine the range of sales by product, run this search:
source="addtotalsData.csv"  chart sum(sales) min(sales) max(sales) range(sales) BY products
The results appear on the Statistics tab and look something like this:
quarter  sum(sales)  min(sales)  max(sales)  range(sales) 

QTR1  4250  1200  1650  450 
QTR2  4150  1175  1550  375 
QTR3  3925  1250  1375  125 
QTR4  4875  1550  1700  150 
The range(sales) is the max(sales) minus the min(sales).
Extended example
See the Extended example for the max()
function. That example includes the range()
function.
stdev(X)
Description
Returns the sample standard deviation of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
This example returns the standard deviation of wildcarded fields "*delay" which can apply to both, "delay" and "xdelay".
...  stats stdev(*delay)
Extended example
This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes data from the past 30 days. 

Run the following search to find the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.
source=usgs place=*California*  stats count mean(mag), stdev(mag), var(mag) BY magType
The results appear on the Statistics tab and look something like this:
magType count mean(mag) std(mag) var(mag) H 123 0.549593 0.356985 0.127438 MbLg 1 0.000000 0.000000 0.000000 Md 1565 1.056486 0.580042 0.336449 Me 2 1.800000 0.346410 0.120000 Ml 1202 1.226622 0.629664 0.396476 Mw 6 3.650000 0.716240 0.513000 ml 10 0.934000 0.560401 0.314049
stdevp(X)
Description
Returns the population standard deviation of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
Extended example
sum(X)
Description
Returns the sum of the values of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
sum(eval(date_hour * date_minute))
sumsq(X)
Description
Returns the sum of the squares of the values of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
upperperc<X>(Y)
Description
Returns a percentile value of the numeric field Y. See the perc<X>(Y)
function.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
var(X)
Description
Returns the sample variance of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
Extended example
See the Extended example for the mean()
function. That example includes the var()
function.
varp(X)
Description
Returns the population variance of the field X.
Usage
You can use this function with the chart
, stats
, and timechart
commands, and also with sparkline()
charts.
Basic examples
PREVIOUS Statistical and charting functions 
NEXT Event order functions 
This documentation applies to the following versions of Splunk^{®} Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 8.0.0, 8.0.1, 8.0.2
Feedback submitted, thanks!