Splunk® Data Stream Processor

Function Reference

DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Time Series Decomposition (beta)

The Time Series Decomposition (STL) algorithm automatically decomposes time series data streams into trend, seasonal, and remainder components in real time, enabling use cases like demand forecasting and anomaly detection to identify outliers.

Time Series Decomposition (STL) implements the streaming version of proven STL (seasonal and trend decomposition using Loess) approaches. The model input is a single stream of numeric time series values. For each raw datapoint observed in input, the model predicts three values corresponding to trend, seasonality, and remainder as output.

This version of the Time Series Decomposition (STL) algorithm only separates a single seasonality from the input time series. It requires the user to specify an estimated periodicity of the observed seasonality (e.g., daily, weekly, or monthly).

This function requires a "timestamp" field in the incoming data stream. This does not appear in the Streaming ML user interface because it is not configurable. For more information, see the Required arguments section.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

stl value="input"
seasonality=100;

Required arguments

seasonality
Syntax: integer
Description: Seasonality sets the periodicity in the data (e.g., daily, weekly, or monthly pattern).
timestamp
Syntax: long
Description: The timestamp corresponding to the observed value in the data stream. This function requires a timestamp field in the dataset. This argument does not appear in the Streaming ML user interface because it is not configurable. You must have a column titled "timestamp" in the incoming data stream.
value
Syntax: double
Description: The data included in your dataset that will apply Time Series Decomposition (STL). Time Series Decomposition (STL) analyzes this input data unless you override using the optional argument for "value".

Optional arguments

key
Syntax: string
Description: The field on which to partition the dataset to apply a different model per key. For example, if you are ingesting CPU metrics from 100 hosts and wants to learn a drift model per host, then "host" is the key. If the dataset has a column titled "key" it will be applied by default. You can override the default by specifying "key=host" to choose a different "key" input.
samplingRate
Syntax: integer
Description: Set samplingRate in cases where timestamps are at irregular intervals.
value
Syntax: double
Description: Set the "value" argument if you want Time Series Decomposition (STL) to analyze different data to that obtained from the value column in the dataset.

Usage

For each observed data point, Time Series Decomposition (STL) computes a trend, seasonality, and residual value. This function can be applied to numeric time-series data, such as metrics or KPIs, to monitor for sudden changes or outliers in any of the three time-series components.

It can be challenging to identify anomalies in time-series metrics with high seasonality. For example, users monitoring web traffic may want to flag abnormally high activity that indicates an unexpected surge, or low activity that indicates a server is down. Traditional anomaly detection approaches may erroneously flag seasonal effects as anomalous - such as quiet hours over the weekend, or high volume days mid-week.

To overcome these false alarms, the Time Series Decomposition (STL) function can be used to first separate the seasonality and trend from the residual numeric time series values. Then, an anomaly detection model like Adaptive Thresholding can be applied to the residual. This approach is proven to improve anomaly detection accuracy to identify outliers, rather than erroneously flagging noisy data.

SPL2 examples

The following example uses Time Series Decomposition (STL) on a test set:

| from splunk_firehose()
| eval json=cast(body, "string"),
       'json-map'=from_json_object(json),
       input=parse_double(ucast(map_get('json-map', "Bytes Sent"), "string", null)),
       key=ucast(map_get('json-map', "Source Address"), "string", null),
       time=ucast(map_get('json-map', "Start Time"), "string", null),
       timestamp=cast(div(cast(get("time"), "long"), 1000000), "long")
| stl samplingRate=10 value="input" seasonality=100; 
Last modified on 13 January, 2021
Stats   To Splunk JSON

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters