Aggregate functions

Aggregate functions summarize the values from each event to create a single, meaningful value. Common aggregate functions include Average, Count, Minimum, Maximum, Standard Deviation, Sum, and Variance.

Most aggregate functions are used with numeric fields. However, there are some functions that you can use with either alphabetic string fields or numeric fields. The function descriptions indicate which functions you can use with alphabetic strings.

For an overview, see statistical and charting functions.

avg(X)

Description

Returns the average of the values of field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

For a list of the related statistical and charting commands that you can use with this function, see Statistical and charting functions.

Basic examples

The following example returns the average (mean) "size" for each distinct "host".

... | stats avg(size) BY host

The following example returns the average "thruput" of each "host" for each 5 minute time span.

... | bin _time span=5m | stats avg(thruput) BY _time host

The following example charts the ratio of the average (mean) "size" to the maximum "delay" for each distinct "host" and "user" pair.

... | chart eval(avg(size)/max(delay)) AS ratio BY host user

The following example displays a timechart of the average of cpu_seconds by processor, rounded to 2 decimal points.

... | timechart eval(round(avg(cpu_seconds),2)) BY processor

Extended example

Chart the average number of events in a transaction, based on transaction duration.

This example uses the sample dataset from the Search Tutorial. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk deployment Then, run this search using the time range, All time.

Run the following search to create a chart that shows the average number of events in a transaction based on the duration of the transaction.

sourcetype=access_* status=200 action=purchase | transaction clientip maxspan=30m | chart avg(eventcount) by duration span=log2

The transaction command also creates a new field called eventcount, which is the number of events in a single transaction.

The transactions are then piped into the chart command and the avg() function is used to calculate the average number of events for each duration. Because the duration is in seconds and you expect there to be many values, the search uses the span argument to bucket the duration into bins of log2 (span=log2). This produces the following table:

Click the Visualizations tab to format the report as a pie chart:

Each wedge of the pie chart represents the average number of events in the transactions of the corresponding duration. After you create the pie chart, you can hover over each of the sections to see these values.

count(X) or c(X)

Description

Returns the number of occurrences of the field X. To indicate a specific field value to match, format X as eval(field="value"). Processes field values as strings. To use this function, you can specify count(X), or the abbreviation c(X).

Usage

You can use the count(X) function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

The following example returns the count of events where status has the value "404". This example uses an eval expression with the count function.

...count(eval(status="404")) AS count_status BY sourcetype

The following example separates search results into 10 bins and returns the count of raw events for each bin.

... | bin size bins=10 | stats count(_raw) BY size

The following example generates a sparkline chart to count the events that use the _raw field.

... sparkline(count)

The following example generates a sparkline chart to count the events that have the user field.

... sparkline(count(user))

Extended examples

These examples use the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk deployment.

The following example uses the timechart command to count the events where action=purchase.

sourcetype=access_* | timechart count(eval(action="purchase")) by productName usenull=f useother=f

The following example uses the chart command to determine the number of different page requests, GET and POST, that occurred for each Web server.

sourcetype=access_* | chart count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host

This example uses eval expressions to specify the different field values for the stats command to count. The first clause uses the count() function to count the Web access events that contain the method field value GET. Then, it renames the field that represents these results to "GET" (this is what the "AS" is doing). The second clause does the same for POST events. The counts of both types of events are then separated by the Web server, indicated by the host field, from which they appeared.

This returns the following table.

Click the Visualizations tab to format the report as a column chart. This chart displays the total count of events for each event type, GET or POST, based on the host value.

distinct_count(X) or dc(X)

Description

Returns the count of distinct values of the field X. This function processes field values as strings. To use this function, you can specify distinct_count(X), or the abbreviation dc(X).

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

The following example removes duplicate results with the same "host" value and return the total count of the remaining results.

... | stats dc(host)

The following example generates sparklines for the distinct count of devices and renames the field, "numdevices".

...sparkline(dc(device)) AS numdevices

The following example counts the distinct sources for each sourcetype, and buckets the count for each five minute spans.

...sparkline(dc(source),5m) BY sourcetype

Extended example

This example uses the sample dataset from the Search Tutorial. Download the data set from this topic in the tutorial and follow the instructions to upload it into the Splunk platform. Then, run this search using the time range, Yesterday.

This example uses the dc(), or distinct_count(), function to count the number of different customers who purchased something from the Buttercup Games online store yesterday. You want to organize the count by the type of product (accessories, t-shirts, and type of games) that customers purchased.

sourcetype=access_* action=purchase | stats dc(clientip) BY categoryId

This example first searches for purchase events (action=purchase). These results are piped into the stats command and the dc() function counts the number of different users who make purchases. The BY clause is used to break up this number based on the different category of products, the categoryId.

estdc(X)

Description

Returns the estimated count of the distinct values of the field X. This function processes field values as strings. The string values 1.0 and 1 are considered distinct values and counted separately.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

The following example removes duplicate results with the same "host" value and returns the estimated total count of the remaining results.

... | stats estdc(host)

The following example generates sparklines for the estimated distinct count of devices and renames the field, "numdevices".

...sparkline(dc(device)) AS numdevices

The following example estimates the distinct count for the sources for each sourcetype. The results are displayed for each five minute span in sparkline charts.

...sparkline(dc(source),5m) BY sourcetype

estdc_error(X)

Description

Returns the theoretical error of the estimated count of the distinct values of the field X. The error represents a ratio of the absolute_value(estimate_distinct_count - real_distinct_count)/real_distinct_count. This function processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

The following example determines the error ratio for the estimated distinct count of the "host" values.

... | stats estdc_error(host)

exactperc<X>(Y)

Description

Returns a percentile value of the numeric field Y. See the perc<X>(Y) function.

max(X)

Description

Returns the maximum value of the field X. If the values of X are non-numeric, the maximum value is found using lexicographical ordering.

Processes field values as numbers if possible, otherwise processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.

Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
Uppercase letters are sorted before lowercase letters.
Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

Basic examples

This example returns the maximum value of "size".

max(size)

Extended example

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes in the past 30 days.

Count the number of earthquakes that occurred for each magnitude range

Just to take a look at the data, calculate the number of earthquakes that occurred in each magnitude range. This data set was comprised of events over a 30-day period.

source=all_month.csv | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"

This search uses span=1 to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".

Calculate aggregate statistics for the magnitudes of earthquakes in an area

Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Then calculate the minimum, maximum, the range (difference between the min and max), and average magnitudes of those recent earthquakes and list them by magnitude type.

source=all_month.csv place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType

Use stats functions for each of these calculations: count(), max(), min(), range(), and avg(). This returns the following table:

mean(X)

Description

Returns the arithmetic mean of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

The following example returns the mean of "kbps" values:

...mean(kbps)

Extended example

This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input to your Splunk instance.

The following example finds the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType

The mean values should be exactly the same as the values calculated using avg().

median(X)

Description

Returns the middle-most value of the field X.

Usage

You can use this function with the chart, stats, and timechart commands.

If you have an even number of events, by default the median calculation is approximated to the higher of the two values. To receive a more accurate median value with an even number of events, change the perc_method in the limits.conf file.

Only users with file system access, such as system administrators, can edit the configuration files. Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make the changes in the local directory.

See How to edit a configuration file in the Admin manual.

In the [stats | sistats] stanza, change the perc_method setting to interpolated.

If you are using Splunk Cloud and want to edit the configuration file, file a Support ticket.

Basic examples

min(X)

Description

Returns the minimum value of the field X. If the values of X are non-numeric, the minimum value is found using lexicographical ordering.

This function processes field values as numbers if possible, otherwise processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.

Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
Uppercase letters are sorted before lowercase letters.
Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

Basic examples

The following example returns the minimum size and maximum size of the HotBucketRoller component in the _internal index.

index=_internal component=HotBucketRoller | stats min(size), max(size)

The following example returns a list of processors and calculates the minimum cpu_seconds and the maximum cpu_seconds.

index=_internal | chart min(cpu_seconds), max(cpu_seconds) BY processor

mode(X)

Description

Returns the most frequent value of the field X.

Processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

perc<X>(Y)

Description

There are three different percentile functions:

perc<X>(Y) (or the abbreviation p<X>(Y))
upperperc<X>(Y)
exactperc<X>(Y)

Returns the X-th percentile value of the numeric field Y. Valid values of X are integers from 1 to 99.

Use the perc<X>(Y) function to calculate an approximate threshold, such that of the values in field Y, X percent fall below the threshold.

The perc and upperperc functions give approximate values for the integer percentile requested. The approximation algorithm that is used, which is based on dynamic compression of a radix tree, provides a strict bound of the actual value for any percentile. The perc function returns a single number that represents the lower end of that range. The upperperc function gives the approximate upper bound. The exactperc function provides the exact value, but will be very expensive for high cardinality fields. The exactperc function could consume a large amount of memory in the search head.

Processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

For the list of values Y = {10,9,8,7,6,5,4,3,2,1}:

perc50(Y)=6

perc95(Y)=10

range(X)

Description

Returns the difference between the max and min values of the field X ONLY IF the values of X are numeric.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

stdev(X)

Description

Returns the sample standard deviation of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

This example returns the standard deviation of wildcarded fields "*delay" which can apply to both, "delay" and "xdelay".

stdev(*delay)

Extended example

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

The following example finds the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType

stdevp(X)

Description

Returns the population standard deviation of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

Extended example

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

Count the number of earthquakes that occurred for each magnitude range

This data set was comprised of events over a 30-day period.

source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"

This search uses span=1 to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".

Find the mean, standard deviation, and variance of the magnitudes of recent earthquakes in California

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType

Use stats functions for each of these calculations: mean(), stdev(), and var(). This returns the following table.

sum(X)

Description

Returns the sum of the values of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

sum(eval(date_hour * date_minute))

sumsq(X)

Description

Returns the sum of the squares of the values of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

upperperc<X>(Y)

Description

Returns a percentile value of the numeric field Y. See the perc<X>(Y) function.

var(X)

Description

Returns the sample variance of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

varp(X)

Description

Returns the population variance of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Search Reference

Related Answers

Aggregate functions

avg(X)

Description

Usage

Basic examples

Extended example

count(X) or c(X)

Description

Usage

Basic examples

Extended examples

distinct_count(X) or dc(X)

Description

Usage

Basic examples

Extended example

estdc(X)

Description

Usage

Basic examples

estdc_error(X)

Description

Usage

Basic examples

exactperc<X>(Y)

Description

max(X)

Description

Usage

Basic examples

Extended example

mean(X)

Description

Usage

Basic examples

Extended example

median(X)

Description

Usage

Basic examples

min(X)

Description

Usage

Basic examples

mode(X)

Description

Usage

Basic examples

perc<X>(Y)

Description

Usage

Basic examples

range(X)

Description

Usage

Basic examples

stdev(X)

Description

Usage

Basic examples

Extended example

stdevp(X)

Description

Usage

Basic examples

Extended example

sum(X)

Description

Usage

Basic examples

sumsq(X)

Description