stats
Description
Calculates aggregate statistics, such as average, count, and sum, over the results set. This is similar to SQL aggregation.
If the stats
command is used without a BY
clause, only one row is returned, which is the aggregation over the entire incoming result set. If a BY
clause is used, one row is returned for each distinct value specified in the BY
clause.
The stats
command can be used for several SQL-like operations. If you are familiar with SQL but new to SPL, see Splunk SPL for SQL users.
Difference between stats and eval commands
The stats
command calculates statistics based on fields in your events. The eval
command creates new fields in your events by using existing fields and an arbitrary expression.
Syntax
Simple: stats (stats-function(field) [AS field])... [BY field-list]
Complete: stats [partitions=<num>] [allnum=<bool>] [delim=<string>] ( <stats-agg-term>... | <sparkline-agg-term>... ) [<by-clause>]
Required arguments
- stats-agg-term
- Syntax: <stats-func>(<evaled-field> | <wc-field>) [AS <wc-field>]
- Description: A statistical aggregation function. See Stats function options. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names. For more information on eval expressions, see Types of eval expressions in the Search Manual.
- sparkline-agg-term
- Syntax: <sparkline-agg> [AS <wc-field>]
- Description: A sparkline aggregation function. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in the field name.
Optional arguments
- allnum
- syntax: allnum=<bool>
- Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical.
- Default: false
- delim
- Syntax: delim=<string>
- Description: Specifies how the values in the list() or values() aggregation are delimited.
- Default: a single space
- by-clause
- Syntax: BY <field-list>
- Description: The name of one or more fields to group by. You cannot use a wildcard character to specify multiple fields with similar names. You must specify each field separately.
- partitions
- Syntax: partitions=<num>
- Description: If specified, partitions the input data based on the split-by fields for multithreaded reduce.
- Default: 1
Stats function options
- stats-func
- Syntax: The syntax depends on the function that you use. Refer to the table below.
- Description: Statistical and charting functions that you can use with the
stats
command. Each time you invoke thestats
command, you can use one or more functions. However, you can only use oneBY
clause. See Usage.
- The following table lists the supported functions by type of function. Use the links in the table to see descriptions and examples for each function. For an overview about using functions with commands, see Statistical and charting functions.
Type of function Supported functions and syntax Aggregate functions avg()
count()
distinct_count()
estdc()
estdc_error()
exactperc<int>()
max()
median()
min()
mode()
perc<int>()
range()
stdev()
stdevp()
sum()
sumsq()
upperperc<int>()
var()
varp()
Event order functions earliest()
first()
last()
latest()
Multivalue stats and chart functions list(X)
values(X)
Sparkline function options
Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. Read more about how to "Add sparklines to your search results" in the Search Manual.
- sparkline-agg
- Syntax: sparkline (count(<wc-field>), <span-length>) | sparkline (<sparkline-func>(<wc-field>), <span-length>)
- Description: A sparkline specifier, which takes the first argument of a aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted. You can use wildcard characters in the field name. See the Usage section.
- sparkline-func
- Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range()
- Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bin.
Usage
The stats
command is a transforming command. See Command types.
Eval expressions with statistical functions
When you use the stats
command, you must specify either a statistical function or a sparkline function. When you use a statistical function, you can use an eval expression as part of the statistical function. For example:
index=* | stats count(eval(status="404")) AS count_status BY sourcetype
Functions and memory usage
Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the distinct_count
function requires far more memory than the count
function. The values
and list
functions also can consume a lot of memory.
If you are using the distinct_count
function without a split-by field or with a low-cardinality split-by by field, consider replacing the distinct_count
function with the the estdc
function (estimated distinct count). The estdc
function might result in significantly lower memory usage and run times.
Memory and maximum results
In the limits.conf
file, the maxresultrows
setting in the [searchresults]
stanza specifies the maximum number of results to return. The default value is 50,000. Increasing this limit can result in more memory usage.
The max_mem_usage_mb
setting in the [default]
stanza is used to limit how much memory the stats
command uses to keep track of information. If the stats
command reaches this limit, the command stops adding the requested fields to the search results. You can increase the limit, contingent on the available system memory.
If you are using Splunk Cloud and want to change either of these limits, file a Support ticket.
Event order functions
Using the first
and last
functions when searching based on time does not produce accurate results.
- To locate the first value based on time order, use the
earliest
function, instead of thefirst
function. - To locate the last value based on time order, use the
latest
function, instead of thelast
function.
For example, consider the following search.
index=test sourcetype=testDb
| eventstats first(LastPass) as LastPass, last(_time) as mostRecentTestTime
BY testCaseId
| where startTime==LastPass OR _time==mostRecentTestTime
| stats first(startTime) AS startTime, first(status) AS status,
first(histID) AS currentHistId, last(histID) AS lastPassHistId BY testCaseId
Replace the first
and last
functions when you use the stats
and eventstats
commands for ordering events based on time. The following search shows the function changes.
index=test sourcetype=testDb
| eventstats latest(LastPass) AS LastPass, earliest(_time) AS mostRecentTestTime
BY testCaseId
| where startTime==LastPass OR _time==mostRecentTestTime
| stats latest(startTime) AS startTime, latest(status) AS status,
latest(histID) AS currentHistId, earliest(histID) AS lastPassHistId BY testCaseId
Wildcards in BY clauses
The stats
command does not support wildcard characters in field values in BY clauses.
For example, you cannot specify | stats count BY source*
.
Renaming fields
You cannot rename one field with multiple names. For example if you have field A, you cannot rename A as B, A as C. The following example is not valid.
... | stats first(host) AS site, first(host) AS report
Basic Examples
1. Return the average transfer rate for each host
sourcetype=access* | stats avg(kbps) BY host
2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"
Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". The "top" command returns a count and percent value for each "referer_domain".
sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total
3. Calculate the average time for each hour for similar fields using wildcard characters
Return the average, for each hour, of any unique field that ends with the string "lay". For example, delay, xdelay, relay, etc.
... | stats avg(*lay) BY date_hour
4. Remove duplicates in the result set and return the total count for the unique results
Remove duplicates of results with the same "host" value and return the total count of the remaining results.
... | stats dc(host)
Extended Examples
1. Count the number of events by HTTP status and host
This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from Get the tutorial data into Splunk and follow the instructions. Then, run this search using the time range, Other > Yesterday. |
Count the number of events for a combination of HTTP status code values and host:
sourcetype=access_* | chart count BY status, host
This creates the following table:
2. Use eval expressions to count the different types of requests against each Web server
This example uses the sample dataset from the Search Tutorial but should work with any format of Apache web access log. Download the data set from this topic in the Search Tutorial and follow the instructions to upload it to your Splunk instance. Use the time range All time when you run this search. |
Run the following search to use the stats
command to determine the number of different page requests, GET and POST, that occurred against each Web server.
sourcetype=access_* | stats count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host
This example uses eval
expressions to specify the different field values for the stats
command to count. The first clause uses the count()
function to count the Web access events that contain the method
field value GET
. Then, using the AS keyword, the field that represents these results is renamed GET.
The second clause does the same for POST events. The counts of both types of events are then separated by the web server, using the BY clause with the host
field.
The results appear on the Statistics tab and should be similar to the results shown in the following table.
host | GET | POST |
---|---|---|
www1 | 8431 | 5197 |
www2 | 8097 | 4815 |
www3 | 8338 | 4654 |
3. Calculate a wide range of statistics by a specific field
These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and add it as an input. |
3a. Count the number of earthquakes that occurred for each magnitude range. (This data set was comprised of events over a 30-day period.)
source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"
This search used span=1
to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".
3b. Search for earthquakes in and around California. Calculate the number of earthquakes that were recorded. Use statisical functions to calculate the minimum, maximum, range (the difference between the min and max), and average magnitudes of the recent earthquakes. List the values by magnitude type.
source=usgs place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType
The results appear on the Statistics tab and should be similar to the results shown in the following table.
magType | count | max(mag) | min(mag) | range(mag) | avg(mag) |
---|---|---|---|---|---|
H | 123 | 2.8 | 0.0 | 2.8 | 0.549593 |
MbLg | 1 | 0 | 0 | 0 | 0.0000000 |
Md | 1565 | 3.2 | 0.1 | 3.1 | 1.056486 |
Me | 2 | 2.0 | 1.6 | .04 | 1.800000 |
Ml | 1202 | 4.3 | -0.4 | 4.7 | 1.226622 |
Mw | 6 | 4.9 | 3.0 | 1.9 | 3.650000 |
ml | 10 | 1.56 | 0.19 | 1.37 | 0.934000 |
3c. Additionally, you can find the mean, standard deviation, and variance of the magnitudes of those recent earthquakes. Organize the results by magnitude type.
source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType
The mean
values should be exactly the same as the values calculated using avg()
.
The results appear on the Statistics tab and should be similar to the results shown in the following table.
magType | count | mean(mag) | std(mag) | var(mag) |
---|---|---|---|---|
H | 123 | 0.549593 | 0.356985 | 0.127438 |
MbLg | 1 | 0.000000 | 0.000000 | 0.000000 |
Md | 1565 | 1.056486 | 0.580042 | 0.336449 |
Me | 2 | 1.800000 | 0.346410 | 0.120000 |
Ml | 1202 | 1.226622 | 0.629664 | 0.396476 |
Mw | 6 | 3.650000 | 0.716240 | 0.513000 |
ml | 10 | 0.934000 | 0.560401 | 0.314049 |
4. In a table display items sold by ID, type, and name and calculate the revenue for each product
This example uses the sample dataset from the Search Tutorial and a field lookup to add more information to the event data.
After you configure the field lookup, you can run this search using the time range, All time. |
Create a table that displays the items sold at the Buttercup Games online store by their ID, type, and name. Also, calculate the revenue for each product.
sourcetype=access_* status=200 action=purchase
| stats values(categoryId) AS Type, values(productName) AS "Product Name", sum(price)
AS "Revenue" by productId
| rename productId AS "Product ID"
| eval Revenue="$ ".tostring(Revenue,"commas")
This example uses the values()
function to display the corresponding categoryId
and productName
values for each productId
. Then, it uses the sum()
function to calculate a running total of the values of the price
field.
Also, this example renames the various fields, for better display. For the stats
functions, the renames are done inline with an "AS" clause. The rename
command is used to change the name of the product_id
field, since the syntax does not let you rename a split-by field.
Finally, the results are piped into an eval
expression to reformat the Revenue
field values so that they read as currency, with a dollar sign and commas.
This returns the following table:
5. Determine how much email comes from each domain
This example uses generated email data (sourcetype=cisco_esa ). You should be able to run this example on any email data by replacing the sourcetype=cisco_esa with your data's sourcetype value and the mailfrom field with your data's email address field name (for example, it might be To, From, or Cc ).
|
Find out how much of your organization's email comes from com/net/org or other top level domains.
sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@")
| eval from_domain=mvindex(accountname,-1)
| stats count(eval(match(from_domain, "[^\n\r\s]+\.com"))) AS ".com",
count(eval(match(from_domain, "[^\n\r\s]+\.net"))) AS ".net",
count(eval(match(from_domain, "[^\n\r\s]+\.org"))) AS ".org",
count(eval(NOT match(from_domain, "[^\n\r\s]+\.(com|net|org)"))) AS "other"
The first half of this search uses eval to break up the email address in the mailfrom
field and define the from_domain
as the portion of the mailfrom
field after the @
symbol.
The results are then piped into the stats
command. The count()
function is used to count the results of the eval
expression. Here, eval
uses the match()
function to compare the from_domain
to a regular expression that looks for the different suffixes in the domain. If the value of from_domain
matches the regular expression, the count
is updated for each suffix, .com, .net, and .org
. Other domain suffixes are counted as other
.
This produces the following results table:
6. Search Web access logs for the total number of hits from the top 10 referring domains
This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the tutorial and follow the instructions to upload it to your Splunk deployment. Then, run this search using the time range, Other > Yesterday. |
Search Web access logs, and return the total number of hits from the top 10 referring domains. (The "top" command returns a count and percent value for each referer
.)
sourcetype=access_* | top limit=10 referer | stats sum(count) AS total
This search uses the top
command to find the ten most common referer domains, which are values of the referer
field. (You might also see this as referer_domain
.) The results of top
are then piped into the stats
command. This example uses the sum()
function to add the number of times each referer
accesses the website. This summation is then saved into a field, total
. This produces the single numeric value.
See also
- Commands
- eventstats, rare, sistats, streamstats, top
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the stats command.
spath | strcat |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13
Feedback submitted, thanks!