Splunk® Enterprise

Search Reference

Download manual as PDF

Download topic as PDF

stats

Description

Calculates aggregate statistics over the results set, such as average, count, and sum. This is similar to SQL aggregation. If stats is used without a by clause only one row is returned, which is the aggregation over the entire incoming result set. If you use a by clause one row is returned for each distinct value specified in the by clause.

Difference between stats and eval commands

The stats command calculates statistics based on fields in your events. The eval command creates new fields in your events by using existing fields and an arbitrary expression.

An image that shows two tables and an example of the stats command in between the tables. The top table shows 2 columns: Time and Event. There are two rows in the table that show sample events. There are timestamps in the Time column. The Event column shows the beginning of the events. The first row shows a GET with an item added to a cart.  The second row shows a POST.

Syntax

Simple: stats (stats-function(field) [AS field])... [BY field-list]

Complete: stats [partitions=<num>] [allnum=<bool>] [delim=<string>] ( <stats-agg-term>... | <sparkline-agg-term>... ) [<by-clause>]

Required arguments

stats-agg-term
Syntax: <stats-function>(<evaled-field> | <wc-field>) [AS <wc-field>]
Description: A statistical aggregation function. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names. For more information on eval expressions, see Types of eval expressions in the Search Manual.
sparkline-agg-term
Syntax: <sparkline-agg> [AS <wc-field>]
Description: A sparkline aggregation function. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in the field name.

Optional arguments

allnum
syntax: allnum=<bool>
Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical.
Default: false
delim
Syntax: delim=<string>
Description: Specifies how the values in the list() or values() aggregation are delimited.
Default: a single space
by-clause
Syntax: BY <field-list>
Description: The name of one or more fields to group by. You cannot use a wildcard character to specify multiple fields with similar names. You must specify each field separately.
partitions
Syntax: partitions=<num>
Description: If specified, partitions the input data based on the split-by fields for multithreaded reduce.
Default: 1

Stats function options

stats-function
Syntax: avg() | c() | count() | dc() | distinct_count() | earliest() | estdc() | estdc_error() | exactperc<int>() | first() | last() | latest() | list() | max() | median() | min() | mode() | p<int>() | perc<int>() | range() | stdev() | stdevp() | sum() | sumsq() | upperperc<int>() | values() | var() | varp()
Description: Functions used with the stats command. Each time you invoke the stats command, you can use more than one function. However, you can only use one by clause.

For a list of stats functions with descriptions and examples, see Statistical and charting functions.

Sparkline function options

Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. Read more about how to "Add sparklines to your search results" in the Search Manual.

sparkline-agg
Syntax: sparkline (count(<wc-field>), <span-length>) | sparkline (<sparkline-func>(<wc-field>), <span-length>)
Description: A sparkline specifier, which takes the first argument of a aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted. You can use wildcard characters in the field name. See the Usage section.
sparkline-func
Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range()
Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bin.

Usage

Eval expressions with statistical functions

When you use the stats command, you must specify either a statistical function or a sparkline function. When you use a statistical function, you can use an eval expression as part of the statistical function. For example:

index=* | stats count(eval(status="404")) AS count_status BY sourcetype

Functions and memory usage

Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the distinct_count function requires far more memory than the count function. The values and list functions also can consume a lot of memory.

If you are using the distinct_count function without a split-by field or with a low-cardinality split-by by field, consider replacing the distinct_count function with the the estdc function (estimated distinct count). The estdc function might result in significantly lower memory usage and run times.

Event order functions

Using the first and last functions when searching based on time does not produce accurate results.

  • To locate the first value based on time order, use the earliest function, instead of the first function.
  • To locate the last value based on time order, use the latest function, instead of the last function.


For example, consider the following search.

index=test sourcetype=testDb | eventstats first(LastPass) as LastPass, last(_time) as mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats first(startTime) AS startTime, first(status) AS status, first(histID) AS currentHistId, last(histID) AS lastPassHistId BY testCaseId

Replace the first and last functions when you use the stats and eventstats commands for ordering events based on time. The following search shows the function changes.

index=test sourcetype=testDb | eventstats latest(LastPass) AS LastPass, earliest(_time) AS mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats latest(startTime) AS startTime, latest(status) AS status, latest(histID) AS currentHistId, earliest(histID) AS lastPassHistId BY testCaseId

Wildcards in BY clauses

The stats command does not support wildcard characters in field values in BY clauses.

For example, you cannot specify | stats count BY source*.

Renaming fields

You cannot rename one field with multiple names. For example if you have field A, you cannot rename A as B, A as C. The following example is not valid.

... | stats first(host) AS site, first(host) AS report

Basic Examples

1. Return the average transfer rate for each host

sourcetype=access* | stats avg(kbps) BY host

2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"

Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". The "top" command returns a count and percent value for each "referer_domain".

sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total

3. Calculate the average time for each hour for similar fields using wildcard characters

Return the average, for each hour, of any unique field that ends with the string "lay". For example, delay, xdelay, relay, etc.

... | stats avg(*lay) BY date_hour

4. Remove duplicates in the result set and return the total count for the unique results

Remove duplicates of results with the same "host" value and return the total count of the remaining results.

... | stats dc(host)

Extended Examples

1. Count the number of events by HTTP status and host

This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from Get the tutorial data into Splunk and follow the instructions. Then, run this search using the time range, Other > Yesterday.

Count the number of events for a combination of HTTP status code values and host:

sourcetype=access_* | chart count BY status, host

This creates the following table: Searchref stats ex1.1.png

2. Use eval expressions to count the different types of requests against each Web server

This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from Get the tutorial data into Splunk and follow the instructions. Then, run this search using the time range, Other > Yesterday.

Count the number of different types of requests made against each Web server.

sourcetype=access_* | stats count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host

This example uses eval expressions to specify field values for the stats command to count. The search is only interested in two page request methods, GET or POST. The first clause counts the Web access events that contain the method=GET field value and call the result "GET". The second clause counts method=POST events. Then the by clause, by host, separates the counts for each request by the host value that they correspond to.

This returns the following table:

Searchref stats ex1.png

3. Calculate a wide range of statistics by a specific field

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

3a. Count the number of earthquakes that occurred for each magnitude range. (This data set was comprised of events over a 30-day period.)

source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"

Searchref stats ex3.01.png

This search used span=1 to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".

3b. Search for earthquakes in and around California and count the number of quakes that were recorded. Then, calculate the minimum, maximum, the range (difference between the min and max), and average magnitudes of those recent quakes and list them by magnitude type.

source=usgs place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType

Use stats functions for each of these calculations: count(), max(), min(), range(), and avg(). This returns the following table:

Searchref stats usgsex2.1.png


3c. Additionally, you can find the mean, standard deviation, and variance of the magnitudes of those recent quakes.

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType

Use stats functions for each of these calculations: mean(), stdev(), and var(). This returns the following table:

Searchref stats usgsex2.2.png


The mean values should be exactly the same as the values calculated using avg().

4. In a table display items sold by ID, type, and name and calculate the revenue for each product

This example uses the sample dataset from the Search Tutorial and a field lookup to add more information to the event data.
  • Download the data set from Add data tutorial and follow the instructions to load the tutorial data.
  • Download the CSV file from Use field lookups tutorial and follow the instructions to set up the lookup definition to add price and productName to the events.

After you configure the field lookup, you can run this search using the time range, All time.

Create a table that displays the items sold at the Buttercup Games online store by their ID, type, and name. Also, calculate the revenue for each product.

sourcetype=access_* status=200 action=purchase | stats values(categoryId) AS Type, values(productName) AS Name, sum(price) AS "Revenue" by productId | rename productId AS "Product ID" | eval Revenue="$ ".tostring(Revenue,"commas")

This example uses the values() function to display the corresponding categoryId and productName values for each productId. Then, it uses the sum() function to calculate a running total of the values of the price field.

Also, this example renames the various fields, for better display. For the stats functions, the renames are done inline with an "AS" clause. The rename command is used to change the name of the product_id field, since the syntax does not let you rename a split-by field.

Finally, the results are piped into an eval expression to reformat the Revenue field values so that they read as currency, with a dollar sign and commas.

This returns the following table:

Searchref stats ex4.png

5. Determine how much email comes from each domain

This example uses generated email data (sourcetype=cisco_esa). You should be able to run this example on any email data by replacing the sourcetype=cisco_esa with your data's sourcetype value and the mailfrom field with your data's email address field name (for example, it might be To, From, or Cc).

Find out how much of your organization's email comes from com/net/org or other top level domains.

sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@") | eval from_domain=mvindex(accountname,-1) | stats count(eval(match(from_domain, "[^\n\r\s]+\.com"))) AS ".com", count(eval(match(from_domain, "[^\n\r\s]+\.net"))) AS ".net", count(eval(match(from_domain, "[^\n\r\s]+\.org"))) AS ".org", count(eval(NOT match(from_domain, "[^\n\r\s]+\.(com|net|org)"))) AS "other"

The first half of this search uses eval to break up the email address in the mailfrom field and define the from_domain as the portion of the mailfrom field after the @ symbol.

The results are then piped into the stats command. The count() function is used to count the results of the eval expression. Here, eval uses the match() function to compare the from_domain to a regular expression that looks for the different suffixes in the domain. If the value of from_domain matches the regular expression, the count is updated for each suffix, .com, .net, and .org. Other domain suffixes are counted as other.

This produces the following results table:

StatsEx5 resultsTable.png

6. Search Web access logs for the total number of hits from the top 10 referring domains

This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the tutorial and follow the instructions to upload it to your Splunk deployment. Then, run this search using the time range, Other > Yesterday.

Search Web access logs, and return the total number of hits from the top 10 referring domains. (The "top" command returns a count and percent value for each referer.)

sourcetype=access_* | top limit=10 referer | stats sum(count) AS total

This search uses the top command to find the ten most common referer domains, which are values of the referer field. (You might also see this as referer_domain.) The results of top are then piped into the stats command. This example uses the sum() function to add the number of times each referer accesses the website. This summation is then saved into a field, total. This produces the single numeric value.

Searchref stats ex6.png

See also

eventstats, rare, sistats, streamstats, top

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the stats command.

PREVIOUS
spath
  NEXT
strcat

This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.6.0, 6.6.1


Comments

Can someone add an explanation for what the argument "dedup_splitvals" does? This is seen all the time in the job inspector, especially when using datamodels. There appears to be no mention of it in documentation or the forums.

Rjthibod
February 18, 2016

Can you provide an example usage of delim with a stats values command?

Sf user 199
May 12, 2015

Sparklines are discussed in greater detail in this topic: http://docs.splunk.com/Documentation/Splunk/6.2.2/Search/Addsparklinestosearchresults

Sophy
March 23, 2015

An example on sparkline would definitely help. I have not seen sparkline using range function anywhere in the splunk docs. Definition for range function is also not available.

Immortalraghavan
July 2, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters