Splunk® Data Stream Processor

Use the Data Stream Processor

DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Summarize records with the stats function

Use the Stats function to perform one or more aggregation calculations on your streaming data. For each aggregation calculation that you want to perform, specify the aggregation functions, the subset of data to perform the calculation on (fields to group by), the timestamp field for windowing, and the output fields for the results.

After the given window time has passed, the stats function outputs the records in your data stream with the user-defined output fields, the fields to group by, and the window length that the aggregations occurred in. The stats function drops all other fields from the record's schema.

The stats function has no concept of wall clock time, and the passage of time is based on the timestamps of incoming records. The Stats function tracks the latest timestamp it received in the stream as the "current" time, and it determines the start and end of windows using this timestamp. Once the difference between the current timestamp and the start timestamp of the current window is greater than the window length, that window is closed and a new window starts. However, since events may arrive out of order, the grace period argument allows the previous window W to remain "open" for a certain period G after its closing timestamp T. Until we receive a record with a timestamp C where C > T + G, any incoming events with timestamp less than T are counted towards the previous window W. See the Stats usage section for more information.

List of aggregation functions

You can use the following aggregation functions within the Stats streaming function:

  • average: Calculates the average in a time window.
  • count: Counts the number of non-null values in a time window.
  • max: Returns the greatest value in a time window.
  • min: Returns the lowest value in a time window.
  • sum: Returns the sum of values in a time window.

If you have the Streaming ML Plugin installed, you also have the following aggregation functions available:

  • etsdc: Calculates an approximate distinct count value for any field.
  • perc: Calculates the approximate q-th percentile value of a numeric field.

Count the number of non-null sources per host in a 60 second time window

Suppose you wanted to count the number of times a source appeared in a given time window per host. This example does the following:

  • Uses the count aggregation function to count the number of non-null sources and outputs the result to num_events_with_source_field.
  • Groups the fields in the output by host.
  • Executes the aggregations in a time window of 60 seconds based on the timestamp of your record.

Steps

  1. From the Canvas view of your pipeline, click on the + icon and add the Stats function to your pipeline.
  2. In the Stats function, add a new Group By.
    1. In Field/Expression, type host.
    2. Click OK.
  3. In the Timestamp field, type timestamp.
  4. In the Window length field, type 60 and select seconds from the drop-down list.
  5. Configure the Stats function to count the number of non-null source values.
    1. Click the New Aggregations drop-down list, and select count.
    2. Type source in Field/Expression, and num_events_with_source_field in Output Field.
  6. Click Validate.
  7. Click Start Preview and the Stats function to verify that your data is being aggregated. In this example, we are using a time window of 60 seconds, so your preview data for Stats shows up after 60 seconds have passed between the timestamps of your records.

If your data stream contained the following data: AggregateExample.png

Following this example, the Stats function would contain the following output:

AggregateExample2.png

See also

Functions
Stats
average
count
max
min
sum
etsdc
perc
Last modified on 15 March, 2022
Working with nested data   About lookups

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters