Splunk® Enterprise

Search Reference

Download manual as PDF

Download topic as PDF

Aggregate functions

Aggregate functions summarize the values from each event to create a single, meaningful value. Common aggregate functions include Average, Count, Minimum, Maximum, Standard Deviation, Sum, and Variance.

Most aggregate functions are used with numeric fields. However, there are some functions that you can use with either alphabetic string fields or numeric fields. The function descriptions indicate which functions you can use with alphabetic strings.

See Statistical and charting functions.


avg(X)

Description

Returns the average of the values of field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

For a list of the related statistical and charting commands that you can use with this function, see Statistical and charting functions.

Basic examples

The following example returns the average (mean) "size" for each distinct "host".

... | stats avg(size) BY host


The following example returns the average "thruput" of each "host" for each 5 minute time span.

... | bin _time span=5m | stats avg(thruput) BY _time host


The following example charts the ratio of the average (mean) "size" to the maximum "delay" for each distinct "host" and "user" pair.

... | chart eval(avg(size)/max(delay)) AS ratio BY host user


The following example displays a timechart of the average of cpu_seconds by processor, rounded to 2 decimal points.

... | timechart eval(round(avg(cpu_seconds),2)) BY processor


Extended example

Chart the average number of events in a transaction, based on transaction duration.

This example uses the sample dataset from the Search Tutorial. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk deployment Then, run this search using the time range, All time.

Create a chart to show the average number of events in a transaction based on the duration of the transaction.

sourcetype=access_* status=200 action=purchase | transaction clientip maxspan=30m | chart avg(eventcount) by duration span=log2

The transaction command also creates a new field called eventcount, which is the number of events in a single transaction.

The transactions are then piped into the chart command and the avg() function is used to calculate the average number of events for each duration. Because the duration is in seconds and you expect there to be many values, the search uses the span argument to bucket the duration into bins of log2 (span=log2). This produces the following table:

Searchref chart ex3.1.png

Click the Visualizations tab to format the report as a pie chart:

Searchref chart ex3.2.png

Each wedge of the pie chart represents the average number of events in the transactions of the corresponding duration. After you create the pie chart, you can hover over each of the sections to see these values.

count(X) or c(X)

Description

Returns the number of occurrences of the field X. To indicate a specific field value to match, format X as eval(field="value"). Processes field values as strings. To use this function, you can specify count(X), or the abbreviation c(X).

Usage

You can use the count(X) function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

The following example returns the count of events where status has the value "404". This example uses an eval expression with the count function.

...count(eval(status="404")) AS count_status BY sourcetype


The following example separates search results into 10 bins and returns the count of raw events for each bin.

... | bin size bins=10 | stats count(_raw) BY size


The following example generates a sparkline chart to count the events that use the _raw field.

... sparkline(count)


The following example generates a sparkline chart to count the events that have the user field.

... sparkline(count(user))


Extended examples

These examples use the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk deployment.

The following example uses the timechart command to count the events where action=purchase.

sourcetype=access_* | timechart count(eval(action="purchase")) by productName usenull=f useother=f


The following example uses the chart command to determine the number of different page requests, GET and POST, that occurred for each Web server.

sourcetype=access_* | chart count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host


This example uses eval expressions to specify the different field values for the stats command to count. The first clause uses the count() function to count the Web access events that contain the method field value GET. Then, it renames the field that represents these results to "GET" (this is what the "AS" is doing). The second clause does the same for POST events. The counts of both types of events are then separated by the Web server, indicated by the host field, from which they appeared.

This returns the following table.

6.5.0 ChartExamples1.png


Click the Visualizations tab to format the report as a column chart. This chart displays the total count of events for each event type, GET or POST, based on the host value.

6.5.0 ChartExamples2.png

distinct_count(X) or dc(X)

Description

Returns the count of distinct values of the field X. This function processes field values as strings. To use this function, you can specify distinct_count(X), or the abbreviation dc(X).

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

The following example removes duplicate results with the same "host" value and return the total count of the remaining results.

... | stats dc(host)


The following example generates sparklines for the distinct count of devices and renames the field, "numdevices".

...sparkline(dc(device)) AS numdevices


The following example counts the distinct sources for each sourcetype, and buckets the count for each five minute spans.

...sparkline(dc(source),5m) BY sourcetype

Extended example

This example uses the sample dataset from the Search Tutorial. Download the data set from this topic in the tutorial and follow the instructions to upload it into the Splunk platform. Then, run this search using the time range, Yesterday.

This example uses the dc(), or distinct_count(), function to count the number of different customers who purchased something from the Buttercup Games online store yesterday. You want to organize the count by the type of product (accessories, t-shirts, and type of games) that customers purchased.

sourcetype=access_* action=purchase | stats dc(clientip) BY categoryId

This example first searches for purchase events (action=purchase). These results are piped into the stats command and the dc() function counts the number of different users who make purchases. The BY clause is used to break up this number based on the different category of products, the categoryId.

6.5.0 DistinctCount1.png


estdc(X)

Description

Returns the estimated count of the distinct values of the field X. This function processes field values as strings. The string values 1.0 and 1 are considered distinct values and counted separately.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

The following example removes duplicate results with the same "host" value and returns the estimated total count of the remaining results.

... | stats estdc(host)


The following example generates sparklines for the estimated distinct count of devices and renames the field, "numdevices".

...sparkline(dc(device)) AS numdevices


The following example estimates the distinct count for the sources for each sourcetype. The results are displayed for each five minute span in sparkline charts.

...sparkline(dc(source),5m) BY sourcetype


estdc_error(X)

Description

Returns the theoretical error of the estimated count of the distinct values of the field X. The error represents a ratio of the absolute_value(estimate_distinct_count - real_distinct_count)/real_distinct_count. This function processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

The following example determines the error ratio for the estimated distinct count of the "host" values.

... | stats estdc_error(host)


max(X)

Description

Returns the maximum value of the field X. If the values of X are non-numeric, the maximum value is found using lexicographical ordering.

Processes field values as numbers if possible, otherwise processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.

  • Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
  • Uppercase letters are sorted before lowercase letters.
  • Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

Basic examples

This example returns the maximum value of "size".

max(size)

Extended example

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and upload the file to your Splunk instance. This example uses the All Earthquakes in the past 30 days.

Count the number of earthquakes that occurred for each magnitude range

Just to take a look at the data, calculate the number of earthquakes that occurred in each magnitude range. This data set was comprised of events over a 30-day period.

source=all_month.csv | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"

Searchref stats ex3.01.png

This search uses span=1 to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".

Calculate aggregate statistics for the magnitudes of earthquakes in an area

Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Then calculate the minimum, maximum, the range (difference between the min and max), and average magnitudes of those recent earthquakes and list them by magnitude type.

source=all_month.csv place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType

Use stats functions for each of these calculations: count(), max(), min(), range(), and avg(). This returns the following table:

Searchref stats usgsex2.1.png


mean(X)

Description

Returns the arithmetic mean of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

The following example returns the mean of "kbps" values:

...mean(kbps)

Extended example

This search uses recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input to your Splunk instance.

The following example finds the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType


Searchref stats usgsex2.2.png


The mean values should be exactly the same as the values calculated using avg().


median(X)

Description

Returns the middle-most value of the field X.

Usage

You can use this function with the chart, stats, and timechart commands.

If you have an even number of events, by default the median calculation is approximated to the higher of the two values. To receive a more accurate median value with an even number of events, change the perc_method in the limits.conf file.

Only users with file system access, such as system administrators, can edit the configuration files. Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make the changes in the local directory.

See How to edit a configuration file in the Admin manual.


In the [stats | sistats] stanza, change the perc_method setting to interpolated.

If you are using Splunk Cloud and want to edit the configuration file, file a Support ticket.

Basic examples

min(X)

Description

Returns the minimum value of the field X. If the values of X are non-numeric, the minimum value is found using lexicographical ordering.

This function processes field values as numbers if possible, otherwise processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.

  • Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
  • Uppercase letters are sorted before lowercase letters.
  • Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

Basic examples

The following example returns the minimum size and maximum size of the HotBucketRoller component in the _internal index.

index=_internal component=HotBucketRoller | stats min(size), max(size)


The following example returns a list of processors and calculates the minimum cpu_seconds and the maximum cpu_seconds.

index=_internal | chart min(cpu_seconds), max(cpu_seconds) BY processor


mode(X)

Description

Returns the most frequent value of the field X.

Processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

percentile <X>(Y)

Description

There are three different percentile functions:

  • p<X>(Y) | perc<X>(Y)
  • upperperc<X>(Y)
  • exactperc<X>(Y)


Returns the X-th percentile value of the numeric field Y, where X is an integer between 1 and 99. The percentile X-th function sorts the values of Y in an increasing order. Then, if you consider that 0% is the lowest and 100% the highest, the functions picks the value that corresponds to the position of the X% value.

The functions perc (or the abbreviation p) and upperperc give approximate values for the integer percentile requested. The approximation algorithm that is used, which is based on dynamic compression of a radix tree, provides a strict bound of the actual value for any percentile. The function perc (or p) return a single number that represents the lower end of that range. The function upperperc gives the approximate upper bound. The exactperc function provides the exact value, but will be very expensive for high cardinality fields. The exactperc function could consume a large amount of memory in the search head.

Processes field values as strings.

Usage

You can use this function with the chart, stats, and timechart commands.

Basic examples

For the list of values Y = {10,9,8,7,6,5,4,3,2,1}:

perc50(Y)=6

perc95(Y)=10

range(X)

Description

Returns the difference between the max and min values of the field X ONLY IF the values of X are numeric.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

stdev(X)

Description

Returns the sample standard deviation of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

This example returns the standard deviation of wildcarded fields "*delay" which can apply to both, "delay" and "xdelay".

stdev(*delay)

Extended example

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

The following example finds the mean, standard deviation, and variance of the magnitudes of recent quakes by magnitude type.

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType


Searchref stats usgsex2.2.png


stdevp(X)

Description

Returns the population standard deviation of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

Extended example

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

Count the number of earthquakes that occurred for each magnitude range

This data set was comprised of events over a 30-day period.

source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"

Searchref stats ex3.01.png

This search uses span=1 to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".

Find the mean, standard deviation, and variance of the magnitudes of recent earthquakes in California

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType

Use stats functions for each of these calculations: mean(), stdev(), and var(). This returns the following table.

Searchref stats usgsex2.2.png


sum(X)

Description

Returns the sum of the values of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

sum(eval(date_hour * date_minute))


sumsq(X)

Description

Returns the sum of the squares of the values of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

var(X)

Description

Returns the sample variance of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

varp(X)

Description

Returns the population variance of the field X.

Usage

You can use this function with the chart, stats, and timechart commands, and also with sparkline() charts.

Basic examples

PREVIOUS
Statistical and charting functions
  NEXT
Event order functions

This documentation applies to the following versions of Splunk® Enterprise: 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 7.0.0, 7.0.1


Comments

Hello Cmerriman
Thank you for your question. The perc_method affects the median function and also all percentile calculations (percXX).

Interpolation means that Splunk computes a percentile based on the 2 closest exact percentiles that are available. Because you might have a bunch of data all bunched up at certain numbers, there isn’t necessarily a single specific number that best represents a particular percentile. In the interpolation method, Splunk chooses the 2 numbers that represent the nearest percentiles higher and lower than the requested percentile, and do a weighted average. For example, if we can say that the 90th percentile is 80 and the 95th percentile is 100, then interpolation means that the 93rd percentile is 92.

It’s basically what Excel does.

I'll look into clarifying this in the limits.conf file. Hope this helps!

Lstewart splunk, Splunker
October 17, 2017

I’m curious if there’s a more in depth explanation of the interpolated perc_method setting in limits.conf for the median function. I’m not exactly sure what it is going to do to the calculation (ie: will it calculate the average of the middle two numbers when there is an even number of values or does it do something else?). I’m also curious if that setting changes calculations for any other function or evaluation.

Cmerriman
October 11, 2017

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters