Stats
This topic describes how to use the function in the Splunk Data Stream Processor.
Description
Applies one or more aggregation functions on a stream of events in a specified time window. The events must be grouped by one or more fields. This function returns a single value. Best practices are to limit window sizes to 24 hours or less and have a slide that is no smaller than 1/6th of your window size. For example, for a window size of 1 minute, make your window slide at least 10 seconds. This function accepts a variable number of arguments.
View configurations for the stats function by highlighting the function in the Data Pipelines UI and clicking View Configurations. In the View Configurations tab, you can check what the original fields are for the data coming in to the stats function in the left sidebar, edit the function's arguments in the UI form, and see the outputted fields for data coming out of the stats function in the right sidebar.
Function Input/Output Schema
- Function Input
- collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
- collection<record<S>>
- This function outputs the same collection of records but with a different schema S.
Syntax
The required syntax is in bold.
- stats
- aggregations(field) [AS field]
- [BY field-name], [span(timestamp, window, size, slide, grace period)]
Required arguments
- By
- Syntax: collection<expression<any>>
- Description: The field values by which to group events.
- UI Example: body
- Timestamp
- Syntax: timestamp=expression<long>
- Description: The field name where your record's timestamps are located.
- UI Example: timestamp or get("timestamp");
- Size
- Syntax: size=<long>
- Description: The window length, in milliseconds, to group events. For a list of valid timescales, see the "Timescales" section.
- UI Example: 60 seconds
- Aggregations
- Syntax: aggregations=collection<expression<any>>
- Description: An aggregation function to apply on your events.
Optional arguments
- Window
- Syntax: sliding | tumbling
- Description: A windowing method. See the table for more details on the two windowing methods.
- Default: Tumbling
- UI Example: Tumbling
Windowing method Description Tumbling A tumbling window slices up time into segments based on the length of the provided window size. With the tumbling option, data in one window will not overlap with data in another window. At the start of each window, any aggregations are restarted. The window does not include the right-most edge. For example, starting at Timestamp=1:00PM for a window size W=5 minutes, the windows would be [1:00PM - 1:05PM), [1:05PM - 1:10PM), [1:10PM - 1:15PM), ..., etc.
Sliding Similar to a tumbling window, a sliding window slices up time into segments based on a provided window size but also uses an additional window slide parameter to control how frequently a sliding window is started. Therefore, sliding windows can be overlapping if the slide is smaller than the window size. For example: Starting at Timestamp=1:00PM for a window size W=5 minutes and window slide S=2 minutes, the windows would be [1:00PM - 1:05PM], [1:02PM - 1:07PM], [1:04PM - 1:09PM], ..., etc.
- Slide
- Syntax: slide=<long>
- Description: The amount of time, in milliseconds, to wait before starting a new window. For a list of valid timescales, see the "Timescales" section.
- UI Example: 60 seconds.
- Grace Period
- Syntax: grace-period=<long>
- Description: The amount of time, in milliseconds, to wait for late-arriving events. In some cases, you may have some events that arrive after the latest time window boundary. For example, if you have a window size of 1 hour (10:00:00AM - 11:00:00AM), an event with timestamp 10:59:00 might come in 2 minutes later at 11:01:00. This setting allows you to specify an amount of time to wait for any late-arriving events for the time window. For a list of valid timescales, see the "Timescales" section.
- UI Example: 10 seconds.
Usage
This section contains additional usage information about the Stats function.
How are time windows calculated?
The stats function has no concept of wall clock time, and the passage of time is based on the timestamps of incoming records. The stats function tracks the latest timestamp it received in the stream as the "current" time, and it determines the start and end of windows using this timestamp. Once the difference between the current timestamp and the start timestamp of the current window is greater than the window length, that window is closed and a new window starts.
However, since records may arrive out of order, the grace period argument allows the previous window W to remain "open" for a certain period G after its closing timestamp T. Until the stats function receives a record with a timestamp C where C > T + G, any incoming records with timestamp less than T are counted towards the previous window W. Once a record with timestamp > C is received, the window is closed and a new window is opened. If the stats function never receives a record with timestamp >= C, then the window will remain open.
When previewing data on a stats function, you will only see data once a window has closed.
Windowing methods example
In the following example, we'll take a look at how your records appear in DSP using different windowing methods and function configurations. Assume that you have seven records with the following timestamps entering the stats function in this order.
{"body": "Event 1", "timestamp": Jul 26 9:00 AM}
{"body": "Event 2", "timestamp": Jul 26 9:30 AM}
{"body": "Event 3", "timestamp": Jul 27 9:00 AM}
{"body": "Event 4", "timestamp": Jul 26 1:00 PM}
{"body": "Event 5", "timestamp": Jul 26 1:30 PM}
{"body": "Event 6", "timestamp": Jul 26 10:15 AM}
{"body": "Event 7", "timestamp": Jul 27 10:00 AM}
Using a tumbling window method
Consider a tumbling window with window size of 1 hour and grace period of 30 minutes.
Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM
is started.
Event 2: The event is included in the currently open window, Jul 26 9:00 AM - Jul 26 10:00 AM
.
Event 3: Timestamp Jul 27 9:00 AM
is later than Jul 26 9:00 AM - Jul 26 10:00 AM
, so the window is closed. It is also greater than Jul 26 10:00 AM + 30min
so the grace period is also closed. A new window for Jul 27 9:00 AM - Jul 27 10:00 AM is started.
Event 4: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM
and there are no windows with open grace periods.
Event 5: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM
and there are no windows with open grace periods.
Event 6: Ignored, falls outside of current window Jul 27 9:00 AM - Jul 27 10:00 AM
and there are no windows with open grace periods.
Event 7: A new window Jul 27 10:00 AM - Jul 27 11:00 AM
is started. Grace period of the Jul 27 9:00 AM - Jul 27 10:00 AM
will be open until any timestamp greater than Jul 27 10:30 AM
is observed.
Using a sliding window method
Consider a sliding window with a window size of 1 hour, window slide of 30 minutes and grace period of 24 hours.
Event 1: A window for Jul 26 9:00 AM - Jul 26 10:00 AM
is started.
Event 2: Counted towards Jul 26 9:00 AM - Jul 26 10:00 AM
. A new window for Jul 26 9:30 AM - Jul 26 10:30 AM
is also started.
Event 3: The timestamp is greater than the two open windows above, but their grace periods remain open until Jul 27 10:00 AM
and Jul 27 10:30 AM
respectively. This event contributes towards a new window Jul 27 9:00 AM - Jul 27 10:00 AM
.
Event 4: Current maximum timestamp is Jul 27 9:00 AM
which is less than Jul 26 2:00 PM + 24 hours
, so this event counts towards the Jul 26 1:00 PM - Jul 26 2:00 PM
window.
Event 5: Counted towards the Jul 26 1:00 PM - Jul 26 2:00 PM
window and Jul 26 1:30 PM - Jul 26 2:30 PM
. Same reason as event 4.
Event 6: Counted towards the window Jul 26 9:30 AM - Jul 26 10:30 AM
, since its grace period is still open.
Event 7: Counted towards a new window Jul 27 10:00 AM - Jul 27 11:00 AM
. Grace period of Jul 26 9:00 AM - Jul 26 10:00 AM
window is closed.
Timescales
The following are valid timescales for the size, slide, and grace period arguments.
Time scale | Syntax | Description |
---|---|---|
<sec> | s | sec | secs | second | seconds | Time scale in seconds. |
<min> | m | min | mins | minute | minutes | Time scale in minutes. |
<hr> | h | hr | hrs | hour | hours | Time scale in hours. |
<day> | d | day | days | Time scale in days. |
<month> | mon | month | months | Time scale in months. |
<subseconds> | ms | cs | ds | Time scale in milliseconds (ms), centiseconds (cs), or deciseconds (ds) |
SPL2 example
Return the status code for each host
... | stats count(host) AS HostPerStatus BY status_code, span(timestamp, 50s, 10s);
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0
Feedback submitted, thanks!