Splunk® Data Stream Processor

Function Reference

DSP 1.2.1 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Adaptive Thresholding (beta)

The Adaptive Thresholding function dynamically generates threshold values based on observed data in a stream. The default implementation of Adaptive Thresholding uses the Gaussian approach. The only difference between the Distribution-free and Gaussian approaches is the implicit assumption about the underlying data distribution.

Users can specify a rolling window (e.g.1 hour, 1 day, 1 week) on which to compute adaptive threshold values. For more information on how to configure rolling window length, see the optional argument for "timestamp".

Function output includes three fields for the Gaussian approach, and two fields for the distribution-free (quantile) approach:

Approach Fields
Gaussian approach (1) estimated mean, (2) estimated standard deviation, (3) predicted label to classify outliers
Distribution-free approach (1) estimated quantile, (2) predicted label to classify outliers

This function requires an "input" field in the incoming data stream. This does not appear in the Streaming ML user interface because it is not configurable. For more information see the Required arguments section.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

adaptive_threshold
algorithm="quantile"
entity="key"
value="input"
window=-1L;

Required arguments

input
Syntax: double
Description: Default input column containing values to detect anomalies and outliers using Adaptive Thresholding. This argument does not appear in the Streaming ML user interface because it is not configurable. If the data set contains a column titled "input" it is used by default. To override this field, use the optional argument for "value".

Optional arguments

algorithm
Syntax: string
Description: Anomaly detection algorithm. Default is gaussian.
Example: "quantile"
entity
Syntax: string
Description: The entity column for per-entity Adaptive Thresholding. If unset, entity is treated as corresponding to a single entity.
Example: "key"
timestamp
Syntax: long
Description: Timestamp that comes with the value.

Timestamp is a required argument if you use a moving window. Timestamp is an optional argument if you do not use a moving window ("-1" or "not present").

threshold
Syntax: double
Description: When using the Gaussian approach, the threshold is a value between 0 and 1. Lower threshold values cause the algorithm to tag fewer data points as outliers. Higher threshold values cause the algorithm to tag more data points as outliers. Default value is 0.01 if not specified.
When using the Distribution-free (quantile) approach, the threshold is a value between 0 and 1. Lower threshold values cause the algorithm to tag fewer data points as outliers. Higher threshold values cause the algorithm to tag more data points as outliers. Default value is 0.01 if not specified.
value
Syntax: double
Description: Set the "value" argument to apply Adaptive Thresholding to different data to that from the "input" argument. The "value" argument overrides "input" when used.
window
Syntax: long
Description: The time window (in milliseconds from epoch) to train on. Defaults to -1.
Example: -1L

Usage

For each data point observed, Adaptive Thresholding outputs predicted labels (binary classification of outliers) and the estimated quantile or Gaussian output. The distribution free approach (quantile) produces the q-th quantile of current data points. A distribution based approach (Gaussian) produces the mean and /variance of current data points. Both approaches generate predicted labels to classify outliers.

Adaptive Thresholding is frequently used to identify outliers in real-time on numeric time series, such as metrics and KPIs. Adaptive Thresholding is useful for monitoring and evaluating the performance of a metric where baseline values are subject to change.

For example, in monitoring the %CPU consumption of a server, you expect the base load to vary dynamically. Applying the Adaptive Thresholding function enables outlier detection on a rolling window (e.g., one hour). With the Gaussian approach, the function generates an estimation of where in the distribution each observed datapoint lies. Predicted outliers correspond to observations that are n-times (e.g., greater than 2-times) the standard deviation from the mean. With the Distribution-free approach, the function computes the q-th quantile of each observed datapoint. Predicted outliers correspond to observations that fall outside the n-th percentile (e.g., greater than 99th percentile).

SPL2 example

The following example uses Adaptive Thresholding to detect anomalies in battery voltage:

| from splunk_firehose()
| eval json=cast(body, "string"),
'json-map'=from_json_object(json), input=parse_double(ucast(map_get('json-map', "voltage"), "string", null)),
time=ucast(map_get('json-map', "timestamp"), "string", null), timestamp=cast(time, "long"),
key=""
| adaptive_threshold algorithm="quantile" entity="key" value="input" window=-1L;
Last modified on 18 March, 2021
Structure of DSP function descriptions   Aggregate with Trigger

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters