Search Reference

 


stats

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

stats

Synopsis

Provides statistics, grouped optionally by field.

Syntax

Simple: stats (stats-function(field) [as field])+ [by field-list]

Complete: stats [allnum=<bool>] [delim=<string>] ( <stats-agg-term> | <sparkline-agg-term> ) [<by clause>]

Required arguments

stats-agg-term
Syntax: <stats-func>( <evaled-field> | <wc-field> ) [AS <wc-field>]
Description: A statistical specifier optionally renamed to a new field name. The specifier can be by an aggregation function applied to a field or set of fields or an aggregation function applied to an arbitrary eval expression.
sparkline-agg-term
Syntax: <sparkline-agg> [AS <wc-field>]
Description: A sparkline specifier optionall renamed to a new field.

Optional arguments

allnum
syntax: allnum=<bool>
Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical. (default is false.)
delim
Syntax: delim=<string>
Description: Used to specify how the values in the list() or values() aggregation are delimited. (default is a single space.)
by clause
Syntax: by <field-list>
Description: The name of one or more fields to group by.

Stats function options

stats-function
Syntax: avg() | c() | count() | dc() | distinct_count() | earliest() | estdc() | estdc_error() | first() | last() | latest() | list() | max() | median() | min() | mode() | p<in>() | perc<int>() | range() | stdev() | stdevp() | sum() | sumsq() | values() | var() | varp()
Description: Functions used with the stats command. Each time you invoke the stats command, you can use more than one function; however, you can only use one by clause. For a list of stats functions with descriptions and examples, see "Functions for stats, chart, and timechart".

Sparkline function options

Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. Read more about how to "Add sparklines to your search results" in the User Manual.

sparkline-agg
Syntax: sparkline (count(<wc-field>), <span-length>) | sparkline (<sparkline-func>(<wc-field>), <span-length>)
Description: A sparkline specifier, which takes the first argument of a aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted.
sparkline-func
Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range()
Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bucket.

Description

Calculate aggregate statistics over the dataset, similar to SQL aggregation. If called without a by clause, one row is produced, which represents the aggregation over the entire incoming result set. If called with a by-clause, one row is produced for each distinct value of the by-clause.

Examples

Example 1

This example uses the sample dataset from the tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the tutorial and follow the instructions to upload it to Splunk. Then, run this search using the time range, Other > Yesterday.

Count the number of different types of requests made against each Web server.

sourcetype=access_* | stats count(eval(method="GET")) AS GET, count(eval(method="POST")) AS POST by host

This example uses eval expressions to specify field values for the stats command to count. The search is only interested in two page request methods, GET or POST. The first clause tells Splunk to count the Web access events that contain the method=GET field value and call the result "GET". The second clause does the same for method=POST events. Then the by clause, by host, separates the counts for each request by the host value that they correspond to.

This returns the following table:

ChartExamples1.png


Note: You can use the stats, chart, and timechart commands to perform the same statistical calculations on your data. The stats command returns a table of results. The chart command returns the same table of results, but you can use the Report Builder to format this table as a chart. If you want to chart your results over a time range, use the timechart command. You can also see variations of this example with the chart and timechart commands.


Example 2

This example uses recent (September 23-29, 2010) earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains the source network (Src), ID (Eqid), version, date, location, magnitude, depth (km) and number of reporting stations (NST) for each earthquake over the last 7 days.

Download the text file, M 1+ earthquakes, past 7 days, save it as a CSV file, and upload it to Splunk. Splunk should extract the fields automatically.

Search for earthquakes in and around California and count the number of quakes that were recorded. Then, calculate the minimum, maximum, the range (difference between the min and max), and average magnitudes of those recent quakes.

source=eqs7day-M1.csv Region=*California | stats count, max(Magnitude), min(Magnitude), range(Magnitude), avg(Magnitude) by Region

Use stats functions for each of these calculations: count(), max(), min(), range(), and avg(). This returns the following table:

StatsExample recentquakes.png


There were 870 events for this data set. From these results, you can see that approximately 350 of those recorded earthquakes occurred in and around California--!!!

Example 3

This example uses recent (September 23-29, 2010) earthquake data downloaded from the USGS Earthquakes website. The data is a comma separated ASCII text file that contains the source network (Src), ID (Eqid), version, date, location, magnitude, depth (km) and number of reporting stations (NST) for each earthquake over the last 7 days.

Download the text file, M 1+ earthquakes, past 7 days, and upload it to Splunk. Splunk should extract the fields automatically.

Search for earthquakes in and around California and calculate the mean, standard deviation, and variance of the magnitudes of those recent quakes.

source=eqs7day-M1.csv Region=*California | stats mean(Magnitude), stdev(Magnitude), var(Magnitude) by Region

Use stats functions for each of these calculations: mean(), stdev(), and var(). This returns the following table:

StatsExample recentquakes2.png


The mean values should be exactly the same as the values calculated using avg() in Example 2.


Example 4

This example uses the sample dataset from the tutorial and a field lookup to add more information to the event data.

The original data set includes a product_id field that is the catalog number for the items sold at the Flower & Gift shop. The field lookup adds three new fields to your events: product_name, which is a descriptive name for the item; product_type, which is a category for the item; and price, which is the cost of the item.

After you configure the field lookup, you can run this search using the time range, All time.

Create a table that displays the items sold at the Flower & Gift shop by their ID, type, and name. Also, calculate the revenue for each product.

sourcetype=access_* action=purchase | stats values(product_type) AS Type, values(product_name) AS Name, sum(price) AS "Revenue" by product_id | rename product_id AS "Product ID" | eval Revenue="$ ".tostring(Revenue,"commas")

This example uses the values() function to display the corresponding product_type and product_name values for each product_id. Then, it uses the sum() function to calculate a running total of the values of the price field.

Also, this example renames the various fields, for better display. For the stats functions, the renames are done inline with an "AS" clause. The rename command is used to change the name of the product_id field, since the syntax does not let you rename a split-by field.

Finally, the results are piped into an eval expression to reformat the Revenue field values so that they read as currency, with a dollar sign and commas.

This returns the following table:

StatsExample TutorialData2.png


It looks like the top 3 purchases over the course of the week were the Beloved's Embrace Bouquet, the Tea & Spa Gift Set, and the Fragrant Jasmine Plant.


Example 5

This example uses generated email data (sourcetype=cisco_esa). You should be able to run this example on any email data by replacing the sourcetype=cisco_esa with your data's sourcetype value and the mailfrom field with your data's email address field name (for example, it might be To, From, or Cc).

Find out how much of your organization's email comes from com/net/org or other top level domains.

sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@") | eval from_domain=mvindex(accountname,-1) | stats count(eval(match(from_domain, "[^\n\r\s]+\.com"))) AS ".com", count(eval(match(from_domain, "[^\n\r\s]+\.net"))) AS ".net", count(eval(match(from_domain, "[^\n\r\s]+\.org"))) AS ".org", count(eval(NOT match(from_domain, "[^\n\r\s]+\.(com|net|org)"))) AS "other"

The first half of this search uses eval to break up the email address in the mailfrom field and define the from_domain as the portion of the mailfrom field after the @ symbol.

The results are then piped into the stats command. The count() function is used to count the results of the eval expression. Here, eval uses the match() function to compare the from_domain to a regular expression that looks for the different suffixes in the domain. If the value of from_domain matches the regular expression, the count is updated for each suffix, .com, .net, and .org. Other domain suffixes are counted as other.

This produces the following results table:

StatsEx5 resultsTable.png


Example 6

This example uses the sample dataset from the tutorial but should work with any format of Apache Web access log. Download the data set from this topic in the tutorial and follow the instructions to upload it to Splunk. Then, run this search using the time range, Other > Yesterday.

Search Web access logs, and return the total number of hits from the top 10 referring domains. (The "top" command returns a count and percent value for each referer.)

sourcetype=access_* | top limit=10 referer | stats sum(count) AS total

This search uses the top command to find the ten most common referer domains, which are values of the referer field. (You might also see this as referer_domain.) The results of top are then piped into the stats command. This example uses the sum() function to add the number of times each referer accesses the website. This summation is then saved into a field, total. This produces the single numeric value:

StatsEx5 result.png


More examples

Example 1: Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". (The "top" command returns a count and percent value for each "referer_domain".)

sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total

Example 2: Return the average for each hour, of any unique field that ends with the string "lay" (for example, delay, xdelay, relay, etc).

... | stats avg(*lay) BY date_hour

Example 3: Remove duplicates of results with the same "host" value and return the total count of the remaining results.

... | stats dc(host)

Example 4: Return the average transfer rate for each host.

sourcetype=access* | stats avg(kbps) by host

See also

eventstats, rare, sistats, streamstats, top

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the stats command.

This documentation applies to the following versions of Splunk: 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 4.3.7 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!