
streamstats
Description
Adds cumulative summary statistics to all search results in a streaming manner. The streamstats
command calculates statistics for each event at the time the event is seen. For example, you can calculate the running total for a particular field. The total is calculated by using the values in the specified field for every event that has been processed, up to the current event.
Syntax
streamstats [reset_on_change=<bool>] [reset_before="("<eval-expression>")"] [reset_after="("<eval-expression>")"] [current=<bool>] [window=<int>] [time_window=<span-length>] [global=<bool>] [allnum=<bool>] <stats-agg-term>... [<by clause>]
Required arguments
- stats-agg-term
- Syntax: <stats-func>( <evaled-field> | <wc-field> ) [AS <wc-field>]
- Description: A statistical aggregation function. See Stats function options. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names.
Optional arguments
- allnum
- Syntax: allnum=<boolean>
- Description: If true, computes numerical statistics on each field only if all of the values in that field are numerical.
- Default: false
- by clause
- Syntax: BY <field-list>
- Description: The name of one or more fields to group by.
- current
- Syntax: current=<boolean>
- Description: If true, the search includes the given, or current, event in the summary calculations. If false, the search uses the field value from the previous event.
- Default: true
- global
- Syntax: global=<boolean>
- Description: Used only when the
window
argument is set. Defines whether to use a single window,global=true
, or to use separate windows based on theby clause
. Ifglobal=false
andwindow
is set to a non-zero value, a separate window is used for each group of values of the field specified in theby clause
. - Default: true
- reset_after
- Syntax: reset_after="("<eval-expression>")"
- Description: After the streamstats calculations are produced for an event,
reset_after
specifies that all of the accumulated statistics are reset if theeval-expression
returnstrue
. Theeval-expression
must evaluate to true or false. Theeval-expression
can reference fields that are returned by thestreamstats
command. When thereset_after
argument is combined with thewindow
argument, the window is also reset when the accumulated statistics are reset. - Default: false
- reset_before
- Syntax: reset_before="("<eval-expression>")"
- Description: Before the streamstats calculations are produced for an event,
reset_before
specifies that all of the accumulated statistics are reset when theeval-expression
returnstrue
. Theeval-expression
must evaluate to true or false. When thereset_before
argument is combined with thewindow
argument, the window is also reset when the accumulated statistics are reset. - Default: false
- reset_on_change
- Syntax: reset_on_change=<bool>
- Description: Specifies that all of the accumulated statistics are reset when the group by fields change. The reset is as if no previous events have been seen. Only events that have all of the group by fields can trigger a reset. Events that have only some of the group by fields are ignored. The
eval-expression
must evaluate to true or false. When thereset_on_change
argument is combined with thewindow
argument, the window is also reset when the accumulated statistics are reset. See the Usage section. - Default: false
- time_window
- Syntax: time_window=<span-length>
- Description: Specifies the window size for the
streamstats
calculations, based on time. Thetime_window
argument is limited by range of values in the_time
field in the events. To use thetime_window
argument, the events must be sorted in either ascending or descending time order. You can use thewindow
argument with thetime_window
argument to specify the maximum number of events in a window. For the<span-length>
, to specify five minutes, usetime_window=5m
. To specify 2 days, usetime_window=2d
. - Default: None. However, the value of the
max_stream_window
attribute in thelimits.conf
file applies. The default value is 10000 events.
- window
- Syntax: window=<integer>
- Description: Specifies the number of events to use when computing the statistics.
- Default: 0, which means that all previous and current events are used.
Stats function options
- stats-func
- Syntax: The syntax depends on the function that you use. Refer to the table below.
- Description: Statistical and charting functions that you can use with the
streamstats
command. Each time you invoke thestreamstats
command, you can use one or more functions.
- The following table lists the supported functions by type of function. For descriptions and examples, see Statistical and charting functions.
Type of function Supported functions and syntax Aggregate functions avg()
count()
distinct_count()
estdc()
estdc_error()
max()
median()
min()
mode()
perc<int>
range()
stdev()
stdevp()
sum()
sumsq()
var()
varp()
Event order functions earliest()
first()
last()
latest()
Multivalue stats and chart functions list(X)
values(X)
Usage
The streamstats
command is similar to the eventstats
command except that it uses events before the current event to compute the aggregate statistics that are applied to each event. If you want to include the current event in the statistical calculations, use current=true
, which is the default.
The streamstats
command is also similar to the stats
command in that streamstats
calculates summary statistics on search results. Unlike stats
, which works on the group of results as a whole, streamstats
calculates statistics for each event at the time the event is seen.
Escaping string values
If your <eval-expression> contains a value instead of a field name, you must escape the quotation marks around the value. For example, you want to look for the value session is closed
in the description field. Because the value is a string, you must enclose it in quotation marks. You then need to escape those quotation marks.
... | streamstats reset_after="("description==\"session is closed\"")" <search>
===The <code>reset_on_change</code> argument===
You have a dataset with the field "shift" that contains either the value DAY or the value NIGHT. You run this search:
<search>...| streamstats count BY shift reset_on_change=true
If the dataset is:
- shift
- DAY
- DAY
- NIGHT
- NIGHT
- NIGHT
- NIGHT
- DAY
- NIGHT
Running the command with reset_on_change=true
produces the following streamstats results:
- shift, count
- DAY, 1
- DAY, 2
- NIGHT, 1
- NIGHT, 2
- NIGHT, 3
- NIGHT, 4
- DAY, 1
- NIGHT, 1
Basic examples
1. Compute the average of a field over the last 5 events
For each event, compute the average of field foo over the last 5 events, including the current event. Similar to doing trendline sma5(foo)
... | streamstats avg(foo) window=5
2. Compute the average of a field, with a by clause, over the last 5 events
For each event, compute the average value of foo for each value of bar including only 5 events, specified by the window size, with that value of bar.
... | streamstats avg(foo) by bar window=5 global=f
3. For each event, add a count of the number of events processed
This example adds to each event a count field that represents the number of events seen so far, including that event. For example, it adds 1 for the first event, 2 for the second event, and so on.
... | streamstats count
If you did not want to include the current event, you would specify:
... | streamstats count current=f
4. Apply a time-based window to streamstats
Assume that the max_stream_window
argument in the limits.conf
file, is the default value of 10000 events.
The following search counts the events, using a time window of five minutes.
... | streamstats count time_window=5m
This search adds a count field to each event.
- If the events are in descending time order (most recent to oldest), the value in the count field represents the number of events in the next 5 minutes.
- If the events are in ascending time order (oldest to most recent), the count field represents the number of events in the previous 5 minutes.
If there are more events in the time-based window than the value for the max_stream_window
argument, the max_stream_window
argument takes precedence. The count will never be > 10000, even if there are actually more than 10,000 events in any 5 minute period.
Extended examples
5. Calculate the running total of distinct users over time
Each day you track unique users, and you would like to track the cumulative count of distinct users. This example calculates the running total of distinct users over time.
eventtype="download" | bin _time span=1d as day | stats values(clientip) as ips dc(clientip) by day | streamstats dc(ips) as "Cumulative total"
The bin
command breaks the time into days. The stats
command calculates the distinct users (clientip) and user count per day. The streamstats
command finds the running distinct count of users.
This search returns a table that includes: day
, ips
, dc(clientip)
, and Cumulative total
.
6. Calculate hourly cumulative totals for category values
This example uses streamstats
to produce hourly cumulative totals for category values.
... | timechart span=1h sum(value) as total by category | streamstats global=f sum(total) as accu_total
The timechart
command buckets the events into spans of 1 hour and counts the total values for each category. The timechart
command also fills NULL values, so that there are no missing values. Then, the streamstats
command is used to calculate the accumulated total.
7. Calculate when a DHCP IP lease address changed for a specific MAC address
This example uses streamstats
to figure out when a DHCP IP lease address changed for a MAC address, 54:00:00:00:00:00.
source=dhcp MAC=54:00:00:00:00:00 | head 10 | streamstats current=f last(DHCP_IP) as new_dhcp_ip last(_time) as time_of_change by MAC
You can also clean up the presentation to display a table of the DHCP IP address changes and the times the occurred.
source=dhcp MAC=54:00:00:00:00:00 | head 10 | streamstats current=f last(DHCP_IP) as new_dhcp_ip last(_time) as time_of_change by MAC | where DHCP_IP!=new_dhcp_ip | convert ctime(time_of_change) as time_of_change | rename DHCP_IP as old_dhcp_ip | table time_of_change, MAC, old_dhcp_ip, new_dhcp_ip
For more details, refer to the Splunk Blogs post for this example.
See also
accum, autoregress, delta, fillnull, eventstats, trendline
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the streamstats command.
PREVIOUS strcat |
NEXT table |
This documentation applies to the following versions of Splunk® Enterprise: 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11
Comments
Hello Rjthibod
Thank you for pointing out the missing information. I have added a description in the "Optional arguments" section. Additionally, I added example #4 to show how to use the "time_window" argument.
The 'time_window' argument is missing documentation. I can deduce what it is does; however, explicit documentation with a behavioral summary would be helpful.
Was the 'time_window' argument added in 6.4.0 or was it missing from the documentation before that?
It would be helpful to include examples that show the syntax for using the "reset_before" and "reset_after" options.