Stats
This topic describes how to use the function in the .
Description
Applies one or more aggregation functions on a stream of events in a specified time window. The events must be grouped by one or more fields. This function returns a single value. Best practices are to limit window sizes to 24 hours or less and have a slide that is no smaller than 1/6th of your window size. For example, for a window size of 1 minute, make your window slide at least 10 seconds. This function accepts a variable number of arguments.
View configurations for the stats function by highlighting the function in the UI and clicking View Configurations. In the View Configurations tab, you can check what the original fields are for the data coming in to the stats function in the left sidebar, edit the function's arguments in the UI form, and see the outputted fields for data coming out of the stats function in the right sidebar.
Function Input/Output Schema
- Function Input
- collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
- collection<record<S>>
- This function outputs the same collection of records but with a different schema S.
Syntax
The required syntax is in bold.
- stats
- aggregations(field) [AS field]
- [BY field-name], [span(timestamp, size, slide, grace period)]
Required arguments
- by
- Syntax: collection<expression<any>>
- Description: The field values by which to group events.
- Example in Canvas View: body
- timestamp
- Syntax: timestamp=expression<long>
- Description: The field name where your record's timestamps are located.
- Example in Canvas View: timestamp
- size
- Syntax: size=<long>
- Description: The window length, in milliseconds, to group events. For a list of valid timescales, see the "Timescales" section.
- Example in Canvas View: 60 seconds
- aggregations
- Syntax: aggregations=collection<expression<any>>
- Description: An aggregation function to apply on your events.
Optional arguments
- window
- Syntax: sliding | tumbling
- Description: A windowing method. You do not need to specify this argument in the SPL2 Builder. Instead, the infers the windowing method based on the
size
andslide
fields. If thesize
andslide
fields are equal to each other, then the windowing method is tumbling. If thesize
is larger than theslide
, then the windowing method used is sliding. See the table for more details on the two windowing methods. - Example in Canvas View: tumbling
Windowing method Description tumbling A tumbling window slices up time into segments based on the length of the provided window size. With the tumbling option, data in one window will not overlap with data in another window. At the start of each window, any aggregations are restarted. The window does not include the right-most edge. For example, starting at Timestamp=1:00PM for a window size W=5 minutes, the windows would be [1:00PM - 1:05PM), [1:05PM - 1:10PM), [1:10PM - 1:15PM), ..., etc.
sliding Similar to a tumbling window, a sliding window slices up time into segments based on a provided window size but also uses an additional window slide parameter to control how frequently a sliding window is started. Therefore, sliding windows can be overlapping if the slide is smaller than the window size. For example: Starting at Timestamp=1:00PM for a window size W=5 minutes and window slide S=2 minutes, the windows would be [1:00PM - 1:05PM], [1:02PM - 1:07PM], [1:04PM - 1:09PM], ..., etc.
- slide
- Syntax: slide=<long>
- Description: The amount of time, in milliseconds, to wait before starting a new window. For a list of valid timescales, see the "Timescales" section.
- Example in Canvas View: 60 seconds.
- grace-period
- Syntax: grace-period=<long>
- Description: The amount of time, in milliseconds, to wait for late-arriving events. In some cases, you may have some events that arrive after the latest time window boundary. This setting allows you to specify an amount of time to wait for any late-arriving events for the time window. If specified, this argument affects when the windows close. For example, if you have a window of 1 hour (10:00:00AM - 11:00:00AM) with a grace period of 5 minutes, then the window won't close until the pipeline receives an event with timestamp >= 11:05:00AM. When you assign a grace period, the pipeline tolerates out of order events. See the How are time windows calculated? section for a more detailed explanation about how time windows are determined and how the grace period argument is used.
- Example in Canvas View: 10 seconds.
Usage
This section contains additional usage information about the Stats function.
How are time windows calculated?
The Stats function has no concept of wall clock time, and the passage of time is based on the timestamps of incoming records. The Stats function tracks the latest timestamp it received in the stream as the "current" time, and it determines the start and end of windows using this timestamp. Once the difference between the current timestamp and the start timestamp of the current window is greater than the window length, that window is closed and a new window starts.
However, since records may arrive out of order, the grace period argument allows the previous window W to remain "open" for a certain period G after its closing timestamp T. Until the Stats function receives a record with a timestamp C where C > T + G, any incoming records with timestamp less than T are counted towards the previous window W. Once a record with timestamp > C is received, the window is closed and a new window is opened. If the Stats function never receives a record with timestamp >= C, then the window will remain open.
To illustrate this, consider the following example where you have a sliding window with a window size of 1 hour (10:00:00AM - 11:00:00AM) and two events that have timestamps 11:01:00AM and 10:59:00AM respectively. The event with timestamp 10:59:00AM arrives after the event with timestamp 11:01:00AM.
- If the grace-period is set to 0, then the pipeline does not tolerate out of order data. When the event with timestamp 11:01:00AM arrives, the window will close and output results. The event with timestamp 10:59:00AM is dropped.
- If the grace-period is set to 5 minutes, then the pipeline tolerates out of order data. Events with timestamps up to 5 minutes past the window are permitted. When the event with timestamp 11:01:00AM arrives, the window remains open and the event with timestamp 11:01:00AM is added to the next window: 11:00:00AM - 12:00:00PM. The event with timestamp 10:59:00AM then arrives and is added to the window 10:00:00AM - 11:00:00AM. The 10:00:00AM - 11:00:00AM window closes and outputs results when an event with timestamp >=11:05:00 arrives.
When previewing data on a stats function, you will only see data once a window has closed.
Windowing methods example
In the following example, we'll take a look at how your records appear in the using different windowing methods and function configurations. Assume that you have seven records with the following timestamps entering the stats function in this order.
{"body": "Event 1", "timestamp": Jul 26 9:00 AM}
{"body": "Event 2", "timestamp": Jul 26 9:30 AM}
{"body": "Event 3", "timestamp": Jul 27 9:00 AM}
{"body": "Event 4", "timestamp": Jul 26 1:00 PM}
{"body": "Event 5", "timestamp": Jul 26 1:30 PM}
{"body": "Event 6", "timestamp": Jul 26 10:15 AM}
{"body": "Event 7", "timestamp": Jul 27 10:00 AM}
Using a tumbling window method
Consider a tumbling window with window size of 1 hour and grace period of 30 minutes.
Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM
is started.
Event 2: The event is included in the currently open window, Jul 26 9:00 AM - Jul 26 10:00 AM
.
Event 3: Timestamp Jul 27 9:00 AM
is later than Jul 26 9:00 AM - Jul 26 10:00 AM
, so the window is closed. It is also greater than Jul 26 10:00 AM + 30min
so the grace period is also closed. A new window for Jul 27 9:00 AM - Jul 27 10:00 AM is started.
Event 4: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM
and there are no windows with open grace periods.
Event 5: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM
and there are no windows with open grace periods.
Event 6: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM
and there are no windows with open grace periods.
Event 7: A new window Jul 27 10:00 AM - Jul 27 11:00 AM
is started. Grace period of the Jul 27 9:00 AM - Jul 27 10:00 AM
will be open until any timestamp greater than Jul 27 10:30 AM
is observed.
Using a sliding window method
Consider a sliding window with a window size of 1 hour, window slide of 30 minutes and grace period of 24 hours.
Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM
is started.
Event 2: Counted towards Jul 26 9:00 AM - Jul 26 10:00 AM
. A new window for Jul 26 9:30 AM - Jul 26 10:30 AM
is also started.
Event 3: The timestamp is greater than the two open windows above, but their grace periods remain open until Jul 27 10:00 AM
and Jul 27 10:30 AM
respectively. This event contributes towards a new window Jul 27 9:00 AM - Jul 27 10:00 AM
.
Event 4: Current maximum timestamp is Jul 27 9:00 AM
which is less than Jul 26 2:00 PM + 24 hours
, so this event counts towards the Jul 26 1:00 PM - Jul 26 2:00 PM
window.
Event 5: Counted towards the Jul 26 1:00 PM - Jul 26 2:00 PM
window and Jul 26 1:30 PM - Jul 26 2:30 PM
. Same reason as event 4.
Event 6: Counted towards the window Jul 26 9:30 AM - Jul 26 10:30 AM
, since its grace period is still open.
Event 7: Counted towards a new window Jul 27 10:00 AM - Jul 27 11:00 AM
. Grace period of Jul 26 9:00 AM - Jul 26 10:00 AM
window is closed.
Timescales
The following are valid timescales for the size, slide, and grace period arguments.
Time scale | Syntax | Description |
---|---|---|
<sec> | s | sec | secs | second | seconds | Time scale in seconds. |
<min> | m | min | mins | minute | minutes | Time scale in minutes. |
<hr> | h | hr | hrs | hour | hours | Time scale in hours. |
<day> | d | day | days | Time scale in days. |
<subseconds> | ms | cs | ds | Time scale in milliseconds (ms), centiseconds (cs), or deciseconds (ds) |
SPL2 example
When working in the SPL View, you can write the function by providing the arguments in this exact order.
Return the status code for each host
... | stats count(host) AS HostPerStatus BY status_code, span(timestamp, 50s, 10s);
Sequential Outlier Detection (beta) | Time Series Decomposition (beta) |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0, 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!