stats

Description

Calculates aggregate statistics, such as average, count, and sum, over the results set. This is similar to SQL aggregation. If the stats command is used without a BY clause, only one row is returned, which is the aggregation over the entire incoming result set. If a BY clause is used, one row is returned for each distinct value specified in the BY clause.

The stats command can be used for several SQL-like operations. If you are familiar with SQL but new to SPL, see Splunk SPL for SQL users.

Difference between stats and eval commands

The stats command calculates statistics based on fields in your events. The eval command creates new fields in your events by using existing fields and an arbitrary expression.

An image that shows two tables and an example of the stats command in between the tables. The top table shows 2 columns: Time and Event. There are two rows in the table that show sample events. There are timestamps in the Time column. The Event column shows the beginning of the events. The first row shows a GET with an item added to a cart. The second row shows a POST.

Syntax

Simple: stats (stats-function(field) [AS field])... [BY field-list]

Complete: stats [partitions=<num>] [allnum=<bool>] [delim=<string>] ( <stats-agg-term>... | <sparkline-agg-term>... ) [<by-clause>]

Required arguments

stats-agg-term: Syntax: <stats-func>(<evaled-field> | <wc-field>) [AS <wc-field>]; Description: A statistical aggregation function. See Stats function options. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names. For more information on eval expressions, see Types of eval expressions in the Search Manual.

sparkline-agg-term: Syntax: <sparkline-agg> [AS <wc-field>]; Description: A sparkline aggregation function. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in the field name.

Optional arguments

allnum: syntax: allnum=<bool>; Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical.; Default: false

delim: Syntax: delim=<string>; Description: Specifies how the values in the list() or values() aggregation are delimited.; Default: a single space

by-clause: Syntax: BY <field-list>; Description: The name of one or more fields to group by. You cannot use a wildcard character to specify multiple fields with similar names. You must specify each field separately.

partitions: Syntax: partitions=<num>; Description: If specified, partitions the input data based on the split-by fields for multithreaded reduce.; Default: 1

Stats function options

stats-func: Syntax: The syntax depends on the function that you use. Refer to the table below.; Description: Statistical and charting functions that you can use with the stats command. Each time you invoke the stats command, you can use one or more functions. However, you can only use one BY clause. See Usage.

The following table lists the supported functions by type of function. Use the links in the table to see descriptions and examples for each function. For an overview about using functions with commands, see Statistical and charting functions.

Type of function	Supported functions and syntax
Aggregate functions	`avg()` `count()` `distinct_count()` `estdc()` `estdc_error()`	`exactperc<int>()` `max()` `median()` `min()` `mode()`	`perc<int>()` `range()` `stdev()` `stdevp()`	`sum()` `sumsq()` `upperperc<int>()` `var()` `varp()`
Event order functions	`earliest()`	`first()`	`last()`	`latest()`
Multivalue stats and chart functions	`list(X)`	`values(X)`

Sparkline function options

Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. Read more about how to "Add sparklines to your search results" in the Search Manual.

sparkline-agg: Syntax: sparkline (count(<wc-field>), <span-length>) | sparkline (<sparkline-func>(<wc-field>), <span-length>); Description: A sparkline specifier, which takes the first argument of a aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted. You can use wildcard characters in the field name. See the Usage section.

sparkline-func: Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range(); Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bin.

Usage

The stats command is a transforming command. See Command types.

Eval expressions with statistical functions

When you use the stats command, you must specify either a statistical function or a sparkline function. When you use a statistical function, you can use an eval expression as part of the statistical function. For example:

index=* | stats count(eval(status="404")) AS count_status BY sourcetype

Functions and memory usage

Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the distinct_count function requires far more memory than the count function. The values and list functions also can consume a lot of memory.

If you are using the distinct_count function without a split-by field or with a low-cardinality split-by by field, consider replacing the distinct_count function with the the estdc function (estimated distinct count). The estdc function might result in significantly lower memory usage and run times.

Memory and maximum results

In the limits.conf file, the maxresultrows setting in the [searchresults] stanza specifies the maximum number of results to return. The default value is 50,000. Increasing this limit can result in more memory usage.

The max_mem_usage_mb setting in the [default] stanza is used to limit how much memory the stats command uses to keep track of information. If the stats command reaches this limit, the command stops adding the requested fields to the search results. You can increase the limit, contingent on the available system memory.

If you are using Splunk Cloud and want to change either of these limits, file a Support ticket.

Event order functions

Using the first and last functions when searching based on time does not produce accurate results.

To locate the first value based on time order, use the earliest function, instead of the first function.
To locate the last value based on time order, use the latest function, instead of the last function.

For example, consider the following search.

index=test sourcetype=testDb | eventstats first(LastPass) as LastPass, last(_time) as mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats first(startTime) AS startTime, first(status) AS status, first(histID) AS currentHistId, last(histID) AS lastPassHistId BY testCaseId

Replace the first and last functions when you use the stats and eventstats commands for ordering events based on time. The following search shows the function changes.

index=test sourcetype=testDb | eventstats latest(LastPass) AS LastPass, earliest(_time) AS mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats latest(startTime) AS startTime, latest(status) AS status, latest(histID) AS currentHistId, earliest(histID) AS lastPassHistId BY testCaseId

Wildcards in BY clauses

The stats command does not support wildcard characters in field values in BY clauses.

For example, you cannot specify | stats count BY source*.

Renaming fields

You cannot rename one field with multiple names. For example if you have field A, you cannot rename A as B, A as C. The following example is not valid.

... | stats first(host) AS site, first(host) AS report

Basic Examples

1. Return the average transfer rate for each host

sourcetype=access* | stats avg(kbps) BY host

2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"

Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". The "top" command returns a count and percent value for each "referer_domain".

sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total

3. Calculate the average time for each hour for similar fields using wildcard characters

Return the average, for each hour, of any unique field that ends with the string "lay". For example, delay, xdelay, relay, etc.

... | stats avg(*lay) BY date_hour

4. Remove duplicates in the result set and return the total count for the unique results

Remove duplicates of results with the same "host" value and return the total count of the remaining results.

... | stats dc(host)

Extended Examples

1. Count the number of events by HTTP status and host

This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from Get the tutorial data into Splunk and follow the instructions. Then, run this search using the time range, Other > Yesterday.

Count the number of events for a combination of HTTP status code values and host:

sourcetype=access_* | chart count BY status, host

This creates the following table:

2. Use eval expressions to count the different types of requests against each Web server

This example uses the sample dataset from the Search Tutorial but should work with any format of Apache web access log. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk instance. Use the time range All time when you run this search.

Run the following search to use the stats command to determine the number of different page requests, GET and POST, that occurred against each Web server.

sourcetype=access_* | stats count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host

This example uses eval expressions to specify the different field values for the stats command to count. The first clause uses the count() function to count the Web access events that contain the method field value GET. Then, using the AS keyword, the field that represents these results is renamed GET.

The second clause does the same for POST events. The counts of both types of events are then separated by the web server, using the BY clause with the host field.

The results appear on the Statistics tab and should be similar to the results shown in the following table.

host	GET	POST
www1	8431	5197
www2	8097	4815
www3	8338	4654

3. Calculate a wide range of statistics by a specific field

These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.

You can download a current CSV file from the USGS Earthquake Feeds and add it as an input.

3a. Count the number of earthquakes that occurred for each magnitude range. (This data set was comprised of events over a 30-day period.)

source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"

This search used span=1 to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".

3b. Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Use statisical functions to calculate the minimum, maximum, range (the difference between the min and max), and average magnitudes of the recent earthquakes. List the values by magnitude type.

source=usgs place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType

The results appear on the Statistics tab and should be similar to the results shown in the following table.

magType	count	max(mag)	min(mag)	range(mag)	avg(mag)
H	123	2.8	0.0	2.8	0.549593
MbLg	1	0	0	0	0.0000000
Md	1565	3.2	0.1	3.1	1.056486
Me	2	2.0	1.6	.04	1.800000
Ml	1202	4.3	-0.4	4.7	1.226622
Mw	6	4.9	3.0	1.9	3.650000
ml	10	1.56	0.19	1.37	0.934000

3c. Additionally, you can find the mean, standard deviation, and variance of the magnitudes of those recent earthquakes. Organize the results by magnitude type.

source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType

The mean values should be exactly the same as the values calculated using avg().

The results appear on the Statistics tab and should be similar to the results shown in the following table.

magType	count	mean(mag)	std(mag)	var(mag)
H	123	0.549593	0.356985	0.127438
MbLg	1	0.000000	0.000000	0.000000
Md	1565	1.056486	0.580042	0.336449
Me	2	1.800000	0.346410	0.120000
Ml	1202	1.226622	0.629664	0.396476
Mw	6	3.650000	0.716240	0.513000
ml	10	0.934000	0.560401	0.314049

4. In a table display items sold by ID, type, and name and calculate the revenue for each product

This example uses the sample dataset from the Search Tutorial and a field lookup to add more information to the event data.

Download the data set from Add data tutorial and follow the instructions to load the tutorial data.
Download the CSV file from Use field lookups tutorial and follow the instructions to set up the lookup definition to add price and productName to the events.

After you configure the field lookup, you can run this search using the time range, All time.

Create a table that displays the items sold at the Buttercup Games online store by their ID, type, and name. Also, calculate the revenue for each product.

sourcetype=access_* status=200 action=purchase | stats values(categoryId) AS Type, values(productName) AS "Product Name", sum(price) AS "Revenue" by productId | rename productId AS "Product ID" | eval Revenue="$ ".tostring(Revenue,"commas")

This example uses the values() function to display the corresponding categoryId and productName values for each productId. Then, it uses the sum() function to calculate a running total of the values of the price field.

Also, this example renames the various fields, for better display. For the stats functions, the renames are done inline with an "AS" clause. The rename command is used to change the name of the product_id field, since the syntax does not let you rename a split-by field.

Finally, the results are piped into an eval expression to reformat the Revenue field values so that they read as currency, with a dollar sign and commas.

This returns the following table:

5. Determine how much email comes from each domain

This example uses generated email data (sourcetype=cisco_esa). You should be able to run this example on any email data by replacing the sourcetype=cisco_esa with your data's sourcetype value and the mailfrom field with your data's email address field name (for example, it might be To, From, or Cc).

Find out how much of your organization's email comes from com/net/org or other top level domains.

sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@") | eval from_domain=mvindex(accountname,-1) | stats count(eval(match(from_domain, "[^\n\r\s]+\.com"))) AS ".com", count(eval(match(from_domain, "[^\n\r\s]+\.net"))) AS ".net", count(eval(match(from_domain, "[^\n\r\s]+\.org"))) AS ".org", count(eval(NOT match(from_domain, "[^\n\r\s]+\.(com|net|org)"))) AS "other"

The first half of this search uses eval to break up the email address in the mailfrom field and define the from_domain as the portion of the mailfrom field after the @ symbol.

The results are then piped into the stats command. The count() function is used to count the results of the eval expression. Here, eval uses the match() function to compare the from_domain to a regular expression that looks for the different suffixes in the domain. If the value of from_domain matches the regular expression, the count is updated for each suffix, .com, .net, and .org. Other domain suffixes are counted as other.

This produces the following results table:

6. Search Web access logs for the total number of hits from the top 10 referring domains

This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the tutorial and follow the instructions to upload it to your Splunk deployment. Then, run this search using the time range, Other > Yesterday.

Search Web access logs, and return the total number of hits from the top 10 referring domains. (The "top" command returns a count and percent value for each referer.)

sourcetype=access_* | top limit=10 referer | stats sum(count) AS total

This search uses the top command to find the ten most common referer domains, which are values of the referer field. (You might also see this as referer_domain.) The results of top are then piped into the stats command. This example uses the sum() function to add the number of times each referer accesses the website. This summation is then saved into a field, total. This produces the single numeric value.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the stats command.

Related answers from Splunk Community

stats

Description

Difference between stats and eval commands

Syntax

Required arguments

Optional arguments

Stats function options

Sparkline function options

Usage

Eval expressions with statistical functions

Functions and memory usage

Memory and maximum results

Event order functions

Wildcards in BY clauses

Renaming fields

Basic Examples

1. Return the average transfer rate for each host

2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"

3. Calculate the average time for each hour for similar fields using wildcard characters

4. Remove duplicates in the result set and return the total count for the unique results

Extended Examples

1. Count the number of events by HTTP status and host

2. Use eval expressions to count the different types of requests against each Web server

3. Calculate a wide range of statistics by a specific field

4. In a table display items sold by ID, type, and name and calculate the revenue for each product

5. Determine how much email comes from each domain

6. Search Web access logs for the total number of hits from the top 10 referring domains

See also

Answers

Comments

stats

Was this topic useful?