
stats
Description
Calculates aggregate statistics over the results set, such as average, count, and sum. This is similar to SQL aggregation.
If stats
is used without a by
clause only one row is returned, which is the aggregation over the entire incoming result set. If you use a by
clause one row is returned for each distinct value specified in the by
clause.
The stats
command can be used for several SQL-like operations. If you are familiar with SQL but new to SPL, see Splunk SPL for SQL users.
Difference between stats and eval commands
The stats
command calculates statistics based on fields in your events. The eval
command creates new fields in your events by using existing fields and an arbitrary expression.
Syntax
Simple: stats (stats-func<field> [AS <field>])... [BY <field-list>]
Complete: stats [partitions=<num>] [allnum=<bool>] [delim=<string>] ( <stats-agg-term>... | <sparkline-agg-term>... ) [<by-clause>]
Required arguments
- stats-agg-term
- Syntax: <stats-func>(<evaled-field> | <wc-field>) [AS <wc-field>]
- Description: A statistical aggregation function. See Stats function options. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names. For more information on eval expressions, see Types of eval expressions in the Search Manual.
- sparkline-agg-term
- Syntax: <sparkline-agg> [AS <wc-field>]
- Description: A sparkline aggregation function. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in the field name.
Optional arguments
- allnum
- syntax: allnum=<bool>
- Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical.
- Default: false
- delim
- Syntax: delim=<string>
- Description: Specifies how the values in the list() or values() aggregation are delimited.
- Default: a single space
- by-clause
- Syntax: BY <field-list>
- Description: The name of one or more fields to group by. You cannot use a wildcard character to specify multiple fields with similar names. You must specify each field separately.
- partitions
- Syntax: partitions=<num>
- Description: If specified, partitions the input data based on the split-by fields for multithreaded reduce.
- Default: 1
Stats function options
- stats-func
- Syntax: The syntax depends on the function you use. Refer to the table below.
- Description: Statistical functions that you can use with the
chart
command. Each time you invoke thechart
command, you can use one or more functions. However, you can only use oneBY
clause. See Usage.
- The following table lists the supported functions by type of function. For descriptions and examples, see Statistical and charting functions.
Type of function Supported functions and syntax Aggregate functions avg()
count()
distinct_count()
estdc()
estdc_error()
max()
median()
min()
mode()
perc<int>
range()
stdev()
stdevp()
sum()
sumsq()
var()
varp()
Event order functions earliest()
first()
last()
latest()
Multivalue stats and chart functions list(X)
values(X)
Sparkline function options
Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. Read more about how to "Add sparklines to your search results" in the Search Manual.
- sparkline-agg
- Syntax: sparkline (count(<wc-field>), <span-length>) | sparkline (<sparkline-func>(<wc-field>), <span-length>)
- Description: A sparkline specifier, which takes the first argument of a aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted. You can use wildcard characters in the field name. See the Usage section.
- sparkline-func
- Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range()
- Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bin.
Usage
Eval expressions with statistical functions
When you use the stats
command, you must specify either a statistical function or a sparkline function. When you use a statistical function, you can use an eval expression as part of the statistical function. For example:
index=* | stats count(eval(status="404")) AS count_status BY sourcetype
Functions and memory usage
Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the distinct_count
function requires far more memory than the count
function. The values
and list
functions also can consume a lot of memory.
If you are using the distinct_count
function without a split-by field or with a low-cardinality split-by by field, consider replacing the distinct_count
function with the the estdc
function (estimated distinct count). The estdc
function might result in significantly lower memory usage and run times.
Memory and maximum results
In the limits.conf
file, the maxresultrows
setting in the [searchresults]
stanza specifies the maximum number of results to return. The default value is 50,000. Increasing this limit can result in more memory usage.
The max_mem_usage_mb
setting in the [default]
stanza is used to limit how much memory the stats
command uses to keep track of information. If the stats
command reaches this limit, the command stops adding the requested fields to the search results. You can increase the limit, contingent on the available system memory.
If you are using Splunk Cloud and want to change either of these limits, file a Support ticket.
Event order functions
Using the first
and last
functions when searching based on time does not produce accurate results.
- To locate the first value based on time order, use the
earliest
function, instead of thefirst
function. - To locate the last value based on time order, use the
latest
function, instead of thelast
function.
For example, consider the following search.
index=test sourcetype=testDb
| eventstats first(LastPass) as LastPass, last(_time) as mostRecentTestTime
BY testCaseId
| where startTime==LastPass OR _time==mostRecentTestTime
| stats first(startTime) AS startTime, first(status) AS status,
first(histID) AS currentHistId, last(histID) AS lastPassHistId BY testCaseId
Replace the first
and last
functions when you use the stats
and eventstats
commands for ordering events based on time. The following search shows the function changes.
index=test sourcetype=testDb
| eventstats latest(LastPass) AS LastPass, earliest(_time) AS mostRecentTestTime
BY testCaseId
| where startTime==LastPass OR _time==mostRecentTestTime
| stats latest(startTime) AS startTime, latest(status) AS status,
latest(histID) AS currentHistId, earliest(histID) AS lastPassHistId BY testCaseId
Wildcards in BY clauses
The stats
command does not support wildcard characters in field values in BY clauses.
For example, you cannot specify | stats count BY source*
.
Renaming fields
You cannot rename one field with multiple names. For example if you have field A, you cannot rename A as B, A as C. The following example is not valid.
... | stats first(host) AS site, first(host) AS report
Basic Examples
1. Return the average transfer rate for each host
sourcetype=access* | stats avg(kbps) BY host
2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"
Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". The "top" command returns a count and percent value for each "referer_domain".
sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total
3. Calculate the average time for each hour for similar fields using wildcard characters
Return the average, for each hour, of any unique field that ends with the string "lay". For example, delay, xdelay, relay, etc.
... | stats avg(*lay) BY date_hour
4. Remove duplicates in the result set and return the total count for the unique results
Remove duplicates of results with the same "host" value and return the total count of the remaining results.
... | stats dc(host)
Extended Examples
1. Count the number of events by HTTP status and host
This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from Get the tutorial data into Splunk and follow the instructions. Then, run this search using the time range, Other > Yesterday. |
Count the number of events for a combination of HTTP status code values and host:
sourcetype=access_* | chart count BY status, host
This creates the following table:
2. Use eval expressions to count the different types of requests against each Web server
This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from Get the tutorial data into Splunk and follow the instructions. Then, run this search using the time range, Other > Yesterday. |
Count the number of different types of requests made against each Web server.
sourcetype=access_* | stats count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST BY host
This example uses eval
expressions to specify field values for the stats
command to count. The search is only interested in two page request methods, GET or POST. The first clause counts the Web access events that contain the method=GET
field value and call the result "GET". The second clause counts method=POST
events. Then the by clause, by host
, separates the counts for each request by the host
value that they correspond to.
This returns the following table:
3. Calculate a wide range of statistics by a specific field
These searches use recent earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains magnitude (mag), coordinates (latitude, longitude), region (place), etc., for each earthquake recorded.
You can download a current CSV file from the USGS Earthquake Feeds and add it as an input. |
3a. Count the number of earthquakes that occurred for each magnitude range. (This data set was comprised of events over a 30-day period.)
source=usgs | chart count AS "Number of Earthquakes" BY mag span=1 | rename mag AS "Magnitude Range"
This search used span=1
to define each of the ranges for the magnitude field, mag. The rename command is then used to rename the field to "Magnitude Range".
3b. Search for earthquakes in and around California and count the number of quakes that were recorded. Then, calculate the minimum, maximum, the range (difference between the min and max), and average magnitudes of those recent quakes and list them by magnitude type.
source=usgs place=*California* | stats count, max(mag), min(mag), range(mag), avg(mag) BY magType
Use stats
functions for each of these calculations: count(), max(), min(), range(), and avg()
. This returns the following table:
3c. Additionally, you can find the mean, standard deviation, and variance of the magnitudes of those recent quakes.
source=usgs place=*California* | stats count mean(mag), stdev(mag), var(mag) BY magType
Use stats
functions for each of these calculations: mean(), stdev(), and var()
. This returns the following table:
The mean
values should be exactly the same as the values calculated using avg()
.
4. In a table display items sold by ID, type, and name and calculate the revenue for each product
This example uses the sample dataset from the Search Tutorial and a field lookup to add more information to the event data.
After you configure the field lookup, you can run this search using the time range, All time. |
Create a table that displays the items sold at the Buttercup Games online store by their ID, type, and name. Also, calculate the revenue for each product.
sourcetype=access_* status=200 action=purchase
| stats values(categoryId) AS Type, values(productName) AS Name, sum(price)
AS "Revenue" by productId
| rename productId AS "Product ID"
| eval Revenue="$ ".tostring(Revenue,"commas")
This example uses the values()
function to display the corresponding categoryId
and productName
values for each productId
. Then, it uses the sum()
function to calculate a running total of the values of the price
field.
Also, this example renames the various fields, for better display. For the stats
functions, the renames are done inline with an "AS" clause. The rename
command is used to change the name of the product_id
field, since the syntax does not let you rename a split-by field.
Finally, the results are piped into an eval
expression to reformat the Revenue
field values so that they read as currency, with a dollar sign and commas.
This returns the following table:
5. Determine how much email comes from each domain
This example uses generated email data (sourcetype=cisco_esa ). You should be able to run this example on any email data by replacing the sourcetype=cisco_esa with your data's sourcetype value and the mailfrom field with your data's email address field name (for example, it might be To, From, or Cc ).
|
Find out how much of your organization's email comes from com/net/org or other top level domains.
sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@")
| eval from_domain=mvindex(accountname,-1)
| stats count(eval(match(from_domain, "[^\n\r\s]+\.com"))) AS ".com",
count(eval(match(from_domain, "[^\n\r\s]+\.net"))) AS ".net",
count(eval(match(from_domain, "[^\n\r\s]+\.org"))) AS ".org",
count(eval(NOT match(from_domain, "[^\n\r\s]+\.(com|net|org)"))) AS "other"
The first half of this search uses eval to break up the email address in the mailfrom
field and define the from_domain
as the portion of the mailfrom
field after the @
symbol.
The results are then piped into the stats
command. The count()
function is used to count the results of the eval
expression. Here, eval
uses the match()
function to compare the from_domain
to a regular expression that looks for the different suffixes in the domain. If the value of from_domain
matches the regular expression, the count
is updated for each suffix, .com, .net, and .org
. Other domain suffixes are counted as other
.
This produces the following results table:
6. Search Web access logs for the total number of hits from the top 10 referring domains
This example uses the sample dataset from the Search Tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the tutorial and follow the instructions to upload it to your Splunk deployment. Then, run this search using the time range, Other > Yesterday. |
Search Web access logs, and return the total number of hits from the top 10 referring domains. (The "top" command returns a count and percent value for each referer
.)
sourcetype=access_* | top limit=10 referer | stats sum(count) AS total
This search uses the top
command to find the ten most common referer domains, which are values of the referer
field. (You might also see this as referer_domain
.) The results of top
are then piped into the stats
command. This example uses the sum()
function to add the number of times each referer
accesses the website. This summation is then saved into a field, total
. This produces the single numeric value.
See also
- Commands
- eventstats, rare, sistats, streamstats, top
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the stats command.
PREVIOUS spath |
NEXT strcat |
This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.1.14, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.2.14, 6.2.15, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10
Comments
Can you provide an example usage of delim with a stats values command?
Sparklines are discussed in greater detail in this topic: http://docs.splunk.com/Documentation/Splunk/6.2.2/Search/Addsparklinestosearchresults
An example on sparkline would definitely help. I have not seen sparkline using range function anywhere in the splunk docs. Definition for range function is also not available.
Can someone add an explanation for what the argument "dedup_splitvals" does? This is seen all the time in the job inspector, especially when using datamodels. There appears to be no mention of it in documentation or the forums.