Stream aggregation methods

Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network.

When you apply aggregation to a stream, only the aggregated data is sent to indexers. Using aggregation can thus help you decrease both storage requirements and license usage.

Stream aggregate functions

Splunk Stream supports a subset of the aggregate functions provided by the SPL (Splunk Processing Language) stats command to calculate statistics based on fields in your network event data. You can apply aggregate functions to your data when you configure a stream in the Configure Streams UI.

Splunk Stream supports these aggregate functions:

sum
sum squared
max
min
mean
median
mode
sample standard deviation
population standard deviation
sample variance
population variance
distinct count
distinct values

For more information on aggregate functions, see Statistical and charting functions in the Splunk Enterprise Search Reference.

How aggregates work

You apply aggregate functions to stream events over a user-defined time interval. When Stream calculates the selected aggregates, it groups events into aggregation buckets, with one bucket allocated for each unique value of the "Key" field (or unique combination of values if there are multiple "Key" fields). At the end of the time interval, the app emits an object that represents each bucket.

For example, to gain more insight into the amount of inbound http traffic, you might select src_ip as a Key field, and apply aggregate functions such as max, mean, std dev (standard deviation), and so on to the bytes_in field of an http stream, over a 60 second time interval.

Stream calculates these aggregates for the bytes_in field for each unique value of src_ip that appears in the http stream, over the specified time interval. Search results for these aggregates might appear as follows:

Aggregated field syntax

Aggregated fields in Splunk Stream version 6.6.0 and later have the following syntax:

function(field_name)

This is a change from version 6.5.x and earlier, where the aggregated field names matched the original field name (such as bytes_in) while actually containing the sum aggregate. To access the latest field aggregation capabilities in Splunk Stream, upgrade to Splunk Stream version 7.0.0, see Upgrade to Splunk Stream 7.0.0 in the Splunk Stream Installation and Configuration Manual.

To use the latest agg

To upgrade aggregated streams from earlier versions of the app to the new syntax in 6.6.0, Splunk Stream provides a migration script that runs automatically when you upgrade to version 6.6.0. For more information,

About the count field

Each aggregated event has a single count field that reflects the total number of raw events aggregated. For example, a search result that displays count: 73 contains 73 total aggregated events, as shown:

About the values aggregate

The values aggregate function produces a list (JSON array) of distinct values of the target field, even if the list contains a single entry. The values in the array are sorted in alphabetical order for text fields and in ascending order for numeric fields.

For example, you might apply the values aggregate to the time_taken field in an http stream to get a list of values for the number of microseconds it took to complete each flow event over the selected time interval. Search results for the values(time_taken) aggregate appear as follows:

About the sum of squares aggregate

In version 6.5.x and earlier, any field X which was being aggregated had a corresponding field psrsvd_ss_X which contained the sum of squares of X. This field did not appear in the stream configuration, but was automatically generated. As of version 6.6.0, the corresponding field is called sumsq(X), and can be selected for generation in the same way as any other aggregation method. (See Configure Streams UI.)

Related answers from Splunk Community

Stream aggregation methods

Stream aggregate functions

How aggregates work

Aggregated field syntax

About the count field

About the values aggregate

About the sum of squares aggregate

Comments

Stream aggregation methods

Was this topic useful?