Splunk® Enterprise

Search Reference

Splunk Enterprise version 8.0 is no longer supported as of October 22, 2021. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.
This documentation does not apply to the most recent version of Splunk® Enterprise. For documentation on the most recent version, go to the latest release.

streamstats

Description

Adds cumulative summary statistics to all search results in a streaming manner. The streamstats command calculates statistics for each event at the time the event is seen. For example, you can calculate the running total for a particular field. The total is calculated by using the values in the specified field for every event that has been processed, up to the current event.

Syntax

The required syntax is in bold.

streamstats
[reset_on_change=<bool>]
[reset_before="("<eval-expression>")"]
[reset_after="("<eval-expression>")"]
[current=<bool>]
[window=<int>]
[time_window=<span-length>]
[global=<bool>]
[allnum=<bool>]
<stats-agg-term>...
[<by-clause>]

Required arguments

stats-agg-term
Syntax: <stats-func>( <evaled-field> | <wc-field> ) [AS <wc-field>]
Description: A statistical aggregation function. See Stats function options. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names. For more information on eval expressions, see Types of eval expressions in the Search Manual.

Optional arguments

allnum
Syntax: allnum=<boolean>
Description: If true, computes numerical statistics on each field only if all of the values in that field are numerical.
Default: false
by-clause
Syntax: BY <field-list>
Description: The name of one or more fields to group by.
current
Syntax: current=<boolean>
Description: If true, the search includes the given, or current, event in the summary calculations. If false, the search uses the field value from the previous event.
Default: true
global
Syntax: global=<boolean>
Description: Used only when the window argument is set. Defines whether to use a single window, global=true, or to use separate windows based on the by clause. If global=false and window is set to a non-zero value, a separate window is used for each group of values of the field specified in the by clause.
Default: true
reset_after
Syntax: reset_after="("<eval-expression>")"
Description: After the streamstats calculations are produced for an event, reset_after specifies that all of the accumulated statistics are reset if the eval-expression returns true. The eval-expression must evaluate to true or false. The eval-expression can reference fields that are returned by the streamstats command. When the reset_after argument is combined with the window argument, the window is also reset when the accumulated statistics are reset.
Default: false
reset_before
Syntax: reset_before="("<eval-expression>")"
Description: Before the streamstats calculations are produced for an event, reset_before specifies that all of the accumulated statistics are reset when the eval-expression returns true. The eval-expression must evaluate to true or false. When the reset_before argument is combined with the window argument, the window is also reset when the accumulated statistics are reset.
Default: false
reset_on_change
Syntax: reset_on_change=<bool>
Description: Specifies that all of the accumulated statistics are reset when the group by fields change. The reset is as if no previous events have been seen. Only events that have all of the group by fields can trigger a reset. Events that have only some of the group by fields are ignored. When the reset_on_change argument is combined with the window argument, the window is also reset when the accumulated statistics are reset. See the Usage section.
Default: false
time_window
Syntax: time_window=<span-length>
Description: Specifies the window size for the streamstats calculations, based on time. The time_window argument is limited by range of values in the _time field in the events. To use the time_window argument, the events must be sorted in either ascending or descending time order. You can use the window argument with the time_window argument to specify the maximum number of events in a window. For the <span-length>, to specify five minutes, use time_window=5m. To specify 2 days, use time_window=2d.
Default: None. However, the value of the max_stream_window attribute in the limits.conf file applies. The default value is 10000 events.
window
Syntax: window=<integer>
Description: Specifies the number of events to use when computing the statistics.
Default: 0, which means that all previous and current events are used.

Stats function options

stats-func
Syntax: The syntax depends on the function that you use. See Usage.
Description: Statistical and charting functions that you can use with the streamstats command. Each time you invoke the streamstats command, you can use one or more functions. However, you can only use one BY clause.

Usage

The streamstats command is a centralized streaming command. See Command types.

The streamstats command is similar to the eventstats command except that it uses events before the current event to compute the aggregate statistics that are applied to each event. If you want to include the current event in the statistical calculations, use current=true, which is the default.

The streamstats command is also similar to the stats command in that streamstats calculates summary statistics on search results. Unlike stats, which works on the group of results as a whole, streamstats calculates statistics for each event at the time the event is seen.

Supported functions

You can use a wide range of functions with the streamstats command. For general information about using functions, see Statistical and charting functions.

Statistical functions that are not applied to specific fields

With the exception of the count function, when you pair the streamstats command with functions that are not applied to specific fields or eval expressions that resolve into fields, the search head processes it as if it were applied to a wildcard for all fields. In other words, when you have | streamstats avg in a search, it returns results for | stats avg(*).

This "implicit wildcard" syntax is officially deprecated, however. Make the wildcard explicit. Write | streamstats <function>(*) when you want a function to apply to all possible fields.

Escaping string values

If your <eval-expression> contains a value instead of a field name, you must escape the quotation marks around the value.

The following example is a simple way to see this. Start by using the makeresults command to create 3 events. Use the streamstats command to produce a cumulative count of the events. Then use the eval command to create a simple test. If the value of the count field is equal to 2, display yes in the test field. Otherwise display no in the test field.

| makeresults count=3 | streamstats count | eval test=if(count==2,"yes","no")

The results appear something like this:

_time count test
2017-01-11 11:32:43 1 no
2017-01-11 11:32:43 2 yes
2017-01-11 11:32:43 3 no

Use the streamstats command to reset the count when the match is true. You must escape the quotation marks around the word yes. The following example shows the complete search.

| makeresults count=3 | streamstats count | eval test=if(count==2,"yes","no") | streamstats count as testCount reset_after="("match(test,\"yes\")")"

Here is another example. You want to look for the value session is closed in the description field. Because the value is a string, you must enclose it in quotation marks. You then need to escape those quotation marks.

... | streamstats reset_after="("description==\"session is closed\"")"

The reset_on_change argument

You have a dataset with the field "shift" that contains either the value DAY or the value NIGHT. You run this search:

...| streamstats count BY shift reset_on_change=true

If the dataset is:

shift
DAY
DAY
NIGHT
NIGHT
NIGHT
NIGHT
DAY
NIGHT

Running the command with reset_on_change=true produces the following streamstats results:

shift, count
DAY, 1
DAY, 2
NIGHT, 1
NIGHT, 2
NIGHT, 3
NIGHT, 4
DAY, 1
NIGHT, 1

Memory and maximum results

The streamstats search processor uses two limits.conf settings to determine the maximum number of results that it can store in memory for the purpose of computing statistics.

The maxresultrows setting specifies a top limit for the window argument. This sets the number of result rows that the streamstats command processor can store in memory. The max_mem_usage_mb setting limits how much memory the streamstats command uses to keep track of information.

If the limit for one setting is reached, the streamstats command processor continues to return results until the limit for the other setting is reached. When both limits are reached, the streamstats command processor stops adding the requested fields to the search results.

If you set max_mem_usage_mb=0, the streamstats command processor uses only the maxresultrows setting as its threshold. When the number of results exceeds the maxresultrows setting, the streamstats command processor stops adding the requested fields to the search results.

Prerequisites

Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location. Make changes to the files in the local directory.

If you have Splunk Cloud and want to change these limits, file a Support ticket.

Basic examples

1. Compute the average of a field over the last 5 events

For each event, compute the average of field foo over the last 5 events, including the current event. Similar to doing trendline sma5(foo)

... | streamstats avg(foo) window=5

2. Compute the average of a field, with a by clause, over the last 5 events

For each event, compute the average value of foo for each value of bar including only 5 events, specified by the window size, with that value of bar.

... | streamstats avg(foo) by bar window=5 global=f

3. For each event, add a count of the number of events processed

This example adds to each event a count field that represents the number of events seen so far, including that event. For example, it adds 1 for the first event, 2 for the second event, and so on.

... | streamstats count

If you did not want to include the current event, you would specify:

... | streamstats count current=f

4. Apply a time-based window to streamstats

Assume that the max_stream_window argument in the limits.conf file is the default value of 10000 events.

The following search counts the events, using a time window of five minutes.

... | streamstats count time_window=5m

This search adds a count field to each event.

  • If the events are in descending time order (most recent to oldest), the value in the count field represents the number of events in the next 5 minutes.
  • If the events are in ascending time order (oldest to most recent), the count field represents the number of events in the previous 5 minutes.

If there are more events in the time-based window than the value for the max_stream_window argument, the max_stream_window argument takes precedence. The count will never be > 10000, even if there are actually more than 10,000 events in any 5 minute period.

Extended examples

1. Create events for testing

You can use the streamstats command with the makeresults command to create a series events. This technique is often used for testing search syntax. The eval command is used to create events with different hours. You use 3600, the number of seconds in an hour, in the eval command.

| makeresults count=5 | streamstats count | eval _time=_time-(count*3600)

The streamstats command is used to create the count field. The streamstats command calculates a cumulative count for each event, at the time the event is processed.

The results look something like this:

_time count
2020-01-09 15:35:14 1
2020-01-09 14:35:14 2
2020-01-09 13:35:14 3
2020-01-09 12:35:14 4
2020-01-09 11:35:14 5

Notice that the hours in the timestamp are 1 hour apart.

You can create additional fields by using the eval command.

| makeresults count=5 | streamstats count | eval _time=_time-(count*3600) | eval age = case(count=1, 25, count=2, 39, count=3, 31, count=4, 27, count=5, null()) | eval city = case(count=1 OR count=3, "San Francisco", count=2 OR count=4, "Seattle",count=5, "Los Angeles")

  • The eval command is used to create two new fields, age and city. The eval command uses the value in the count field.
  • The case function takes pairs of arguments, such as count=1, 25. The first argument is a Boolean expression. When that expression is TRUE, the corresponding second argument is returned.

The results of the search look like this:

_time age city count
2020-01-09 15:35:14 25 San Francisco 1
2020-01-09 14:35:14 39 Seattle 2
2020-01-09 13:35:14 31 San Francisco 3
2020-01-09 12:35:14 27 Seattle 4
2020-01-09 11:35:14 Los Angeles 5

2. Calculate a snapshot of summary statistics

This example uses the sample data from the Search Tutorial but should work with any format of Apache web access log. To try this example on your own Splunk instance, you must download the sample data and follow the instructions to get the tutorial data into Splunk. Use the time range All time when you run the search.

You want to determine the number of the bytes used over a set period of time. The following search uses the first 5 events. Because search results typically display the most recent event first, the sort command is used to sort the 5 events in ascending order to see the oldest event first and the most recent event last. Ascending order enables the streamstats command to calculate statistics over time.

sourcetype=access_combined* | head 5 | sort _time

This image shows the most recent 5 events in the data set. The events are in ascending order by time.


Add the streamstats command to the search to generate a running total of the bytes over the 5 events and organize the results by clientip.

sourcetype=access_combined* | head 5 |sort _time | streamstats sum(bytes) AS ASimpleSumOfBytes BY clientip


When you click on the ASimpleSumOfBytes field in the list of Interesting fields, an information window shows the cumulative sum of the bytes, as shown in this image:

This image shows the ASimpleSumOfBytes field selected in the list of Interesting fields. A popup window shows the cumulative sum of the bytes.

The streamstats command aggregates the statistics to the original data, which means that all of the original data is accessible for further calculations.


Add the table command to the search to display the only the values in the _time, clientip, bytes, and ASimpleSumOfBytes fields.

sourcetype=access_combined* | head 5 |sort _time | streamstats sum(bytes) as ASimpleSumOfBytes by clientip | table _time, clientip, bytes, ASimpleSumOfBytes

Each event shows the timestamp for the event, the clientip, and the number of bytes used. The ASimpleSumOfBytes field shows a cumulative summary of the bytes for each clientip.

This image shows the Statistics tab with the columns _time, clientip, bytes, ASimpleSumOfBytes.

3. Calculate the running total of distinct users over time

Each day you track unique users, and you would like to track the cumulative count of distinct users. This example calculates the running total of distinct users over time.

eventtype="download" | bin _time span=1d as day | stats values(clientip) as ips dc(clientip) by day | streamstats dc(ips) as "Cumulative total"

The bin command breaks the time into days. The stats command calculates the distinct users (clientip) and user count per day. The streamstats command finds the running distinct count of users.

This search returns a table that includes: day, ips, dc(clientip), and Cumulative total.

4. Calculate hourly cumulative totals

This example uses streamstats to produce hourly cumulative totals.

... | timechart span=1h sum(bytes) as SumOfBytes | streamstats global=f sum(*) as accu_total_*

This search returns 3 columns: _time, SumOfBytes, and accu_total_SumOfBytes.

The timechart command buckets the events into spans of 1 hour and counts the total values for each category. The timechart command also fills NULL values, so that there are no missing values. Then, the streamstats command is used to calculate the accumulated total.


This example uses streamstats to produce hourly cumulative totals for category values.

... | timechart span=1h sum(value) as total by category | streamstats global=f | addtotals | accum Total | rename Total as accu_total

5. Calculate when a DHCP IP lease address changed for a specific MAC address

This example uses streamstats to figure out when a DHCP IP lease address changed for a MAC address, 54:00:00:00:00:00.

source=dhcp MAC=54:00:00:00:00:00 | head 10 | streamstats current=f last(DHCP_IP) as new_dhcp_ip last(_time) as time_of_change by MAC

You can also clean up the presentation to display a table of the DHCP IP address changes and the times the occurred.

source=dhcp MAC=54:00:00:00:00:00 | head 10 | streamstats current=f last(DHCP_IP) as new_dhcp_ip last(_time) as time_of_change by MAC | where DHCP_IP!=new_dhcp_ip | convert ctime(time_of_change) as time_of_change | rename DHCP_IP as old_dhcp_ip | table time_of_change, MAC, old_dhcp_ip, new_dhcp_ip

For more details, refer to the Splunk Blogs post for this example.

See also

Commands
accum
autoregress
delta
fillnull
eventstats
makeresults
trendline
Blogs
Getting started with stats, eventstats and streamstats
Last modified on 08 January, 2021
 

This documentation applies to the following versions of Splunk® Enterprise: 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters