Adaptive Thresholding (beta)
The Adaptive Thresholding function dynamically generates threshold values based on observed data in a stream. The default implementation of Adaptive Thresholding uses the Gaussian approach. The only difference between the Distribution-free and Gaussian approaches is the implicit assumption about the underlying data distribution.
Users can specify a rolling window (e.g.1 hour, 1 day, 1 week) on which to compute adaptive threshold values. For more information on how to configure rolling window length, see the optional argument for "timestamp".
Function output includes three fields for the Gaussian approach, and two fields for the distribution-free (quantile) approach:
Approach Fields Gaussian approach (1) estimated mean, (2) estimated standard deviation, (3) predicted label to classify outliers Distribution-free approach (1) estimated quantile, (2) predicted label to classify outliers
This function requires an "input" field in the incoming data stream. This does not appear in the Streaming ML user interface because it is not configurable. For more information see the Required arguments section.
Function Input/Output Schema
- Function Input
collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
collection<record<S>>
- This function outputs collections of records with schema S.
Syntax
- adaptive_threshold
- algorithm="quantile"
- entity="key"
- value="input"
- window=-1L;
Required arguments
- input
- Syntax: double
- Description: Default input column containing values to detect anomalies and outliers using Adaptive Thresholding. This argument does not appear in the Streaming ML user interface because it is not configurable. If the data set contains a column titled "input" it is used by default. To override this field, use the optional argument for "value".
Optional arguments
- algorithm
- Syntax: string
- Description: Anomaly detection algorithm. Default is gaussian.
- Example: "quantile"
- entity
- Syntax: string
- Description: The entity column for per-entity Adaptive Thresholding. If unset, entity is treated as corresponding to a single entity.
- Example: "key"
- timestamp
- Syntax: long
- Description: Timestamp that comes with the value.
Timestamp is a required argument if you use a moving window. Timestamp is an optional argument if you do not use a moving window ("-1" or "not present").
- threshold
- Syntax: double
- Description: When using the Gaussian approach, the threshold is a value between 0 and 1. Lower threshold values cause the algorithm to tag fewer data points as outliers. Higher threshold values cause the algorithm to tag more data points as outliers. Default value is 0.01 if not specified.
- When using the Distribution-free (quantile) approach, the threshold is a value between 0 and 1. Lower threshold values cause the algorithm to tag fewer data points as outliers. Higher threshold values cause the algorithm to tag more data points as outliers. Default value is 0.01 if not specified.
- value
- Syntax: double
- Description: Set the "value" argument to apply Adaptive Thresholding to different data to that from the "input" argument. The "value" argument overrides "input" when used.
- window
- Syntax: long
- Description: The time window (in milliseconds from epoch) to train on. Defaults to -1.
- Example: -1L
Usage
For each data point observed, Adaptive Thresholding outputs predicted labels (binary classification of outliers) and the estimated quantile or Gaussian output. The distribution free approach (quantile) produces the q-th quantile of current data points. A distribution based approach (Gaussian) produces the mean and /variance of current data points. Both approaches generate predicted labels to classify outliers.
Adaptive Thresholding is frequently used to identify outliers in real-time on numeric time series, such as metrics and KPIs. Adaptive Thresholding is useful for monitoring and evaluating the performance of a metric where baseline values are subject to change.
For example, in monitoring the %CPU consumption of a server, you expect the base load to vary dynamically. Applying the Adaptive Thresholding function enables outlier detection on a rolling window (e.g., one hour). With the Gaussian approach, the function generates an estimation of where in the distribution each observed datapoint lies. Predicted outliers correspond to observations that are n-times (e.g., greater than 2-times) the standard deviation from the mean. With the Distribution-free approach, the function computes the q-th quantile of each observed datapoint. Predicted outliers correspond to observations that fall outside the n-th percentile (e.g., greater than 99th percentile).
SPL2 example
The following example uses Adaptive Thresholding to detect anomalies in battery voltage:
| from splunk_firehose() | eval json=cast(body, "string"), 'json-map'=from_json_object(json), input=parse_double(ucast(map_get('json-map', "voltage"), "string", null)), time=ucast(map_get('json-map', "timestamp"), "string", null), timestamp=cast(time, "long"), key="" | adaptive_threshold algorithm="quantile" entity="key" value="input" window=-1L;
Structure of DSP function descriptions | Aggregate with Trigger |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!