Splunk® Data Stream Processor

Function Reference

DSP 1.2.1 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Sequential Outlier Detection (beta)

Sequential Outlier Detection identifies anomalous events from an observed sequence in a time series. The input time series must be events, such as user access requests, network requests, transactions, and so on. Sequential Outlier Detection can be applied to online streams for real-time monitoring to identify if sequences or patterns of events in a time series are anomalous.

Sequential Outlier Detection predicts an output that indicates if an observed sequence is anomalous or not. Smaller predicted values indicate that an observed sequence is more likely to be anomalous. Higher predicted values indicate that an observed sequence is expected, and not anomalous, based on previously observed data.

This function requires a "input" field in the incoming data stream. This does not appear in the Streaming ML user interface because it is not configurable. For more information, see the Required arguments section.

Function Input/Output Schema

Function Input
collection<record<R>>
This function takes in collections of records with schema R.
Function Output
collection<record<S>>
This function outputs collections of records with schema S.

Syntax

detect_sequential_outliers
value="input";

Required arguments

input
Syntax: string
Description: Default input column containing data on which to detect outliers such as username, host, or action. This argument does not appear in the UI because it is not configurable. If the dataset contains a column titled "input" it will be used by default. To override this field, use the optional argument for "value".

Optional arguments

key
Syntax: string
Description: The field on which to partition the dataset to apply a different model per key. If the dataset has a column titled "key" it will be applied by default. You can override the default by specifying "key=host" to choose a different "key" input.
value
Syntax: string
Description: Set the "value" argument if you want Sequential Outlier Detection to analyze different data to that obtained from the value column in the dataset. The "value" argument overrides the required "input" argument when used.

This algorithm contains three pre-tuned parameters: Markov Order, Prune Threshold, and Prune Trigger Count. These parameters do not require manual input and can be ignored by most users.

For users interested in understanding more about these pre-tuned parameters, they are as follows:

val markovOrder: Int = DEFAULT_MARKOV_ORDER
val pruneThreshold: Int = DEFAULT_PRUNE_THRESHOLD
val pruneTriggerCount: Int = DEFAULT_TRIGGER_COUNT

Usage

The input to Sequential Outlier Detection is a time series of events, such as logs of commands executed over time. The model predicts whether the observed sequence in a stream is anomalous in real time.

For each data point observed, Sequential Outlier Detection outputs a probability score between 0 and 1 that corresponds to the probability that the sequence is normal or not. The lower the predicted output, the less likely the past sequence has been observed. This corresponds to a higher probability of anomaly. Predicted outputs closer to 1 correspond to a lower probability of anomaly (i.e., that the sequence is more likely to be normal).

The max length of the past sequence the algorithm computes the probability for is given by the default markov order of the algorithm at 4 observations.

For example, security users may want to identify suspicious activity from shell commands. To do so, she can use Sequential Outlier Detection to identify anomalous sequences of command logs executed over time. Each command may seem normal in isolation, but the sequence of commands can be used to identify suspicious activity. This approach to anomaly detection provides more context about the events and time series that are being monitored, improving the ability to detect abnormal sequences among event data.

SPL2 example

The following example uses Sequential Outlier Detection to identify anomalies in event code:

| from splunk_firehose() 
| eval json=cast(body, "string"), 
	'json-map'=from_json_object(json), 
	input=ucast(map_get('json-map', "eventCode"), "string", null), 
	time=ucast(map_get('json-map', "timestamp"), "string", null), 
	key=ucast(map_get('json-map', "user"), "string", null), 
	timestamp=cast(time, "long") 
| detect_sequential_outliers value="input";
Last modified on 24 February, 2021
Sentiment Analysis (beta)   Stats

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters