Sequential Outlier Detection (beta)
Sequential Outlier Detection identifies anomalous events from an observed sequence in a time series. The input time series must be events, such as user access requests, network requests, transactions, and so on. Sequential Outlier Detection can be applied to online streams for real-time monitoring to identify if sequences or patterns of events in a time series are anomalous.
Sequential Outlier Detection predicts an output that indicates if an observed sequence is anomalous or not. Smaller predicted values indicate that an observed sequence is more likely to be anomalous. Higher predicted values indicate that an observed sequence is expected, and not anomalous, based on previously observed data.
This function requires a "input" field in the incoming data stream. This does not appear in the Streaming ML user interface because it is not configurable. For more information, see the Required arguments section.
Function Input/Output Schema
- Function Input
collection<record<R>>
- This function takes in collections of records with schema R.
- Function Output
collection<record<S>>
- This function outputs collections of records with schema S.
Syntax
- detect_sequential_outliers
- value="input";
Required arguments
- input
- Syntax: string
- Description: Default input column containing data on which to detect outliers such as username, host, or action. This argument does not appear in the UI because it is not configurable. If the dataset contains a column titled "input" it will be used by default. To override this field, use the optional argument for "value".
Optional arguments
- key
- Syntax: string
- Description: The field on which to partition the dataset to apply a different model per key. If the dataset has a column titled "key" it will be applied by default. You can override the default by specifying "key=host" to choose a different "key" input.
- value
- Syntax: string
- Description: Set the "value" argument if you want Sequential Outlier Detection to analyze different data to that obtained from the value column in the dataset. The "value" argument overrides the required "input" argument when used.
This algorithm contains three pre-tuned parameters: Markov Order, Prune Threshold, and Prune Trigger Count. These parameters do not require manual input and can be ignored by most users.
For users interested in understanding more about these pre-tuned parameters, they are as follows:
val markovOrder: Int = DEFAULT_MARKOV_ORDER
val pruneThreshold: Int = DEFAULT_PRUNE_THRESHOLD
val pruneTriggerCount: Int = DEFAULT_TRIGGER_COUNT
Usage
The input to Sequential Outlier Detection is a time series of events, such as logs of commands executed over time. The model predicts whether the observed sequence in a stream is anomalous in real time.
For each data point observed, Sequential Outlier Detection outputs a probability score between 0 and 1 that corresponds to the probability that the sequence is normal or not. The lower the predicted output, the less likely the past sequence has been observed. This corresponds to a higher probability of anomaly. Predicted outputs closer to 1 correspond to a lower probability of anomaly (i.e., that the sequence is more likely to be normal).
The max length of the past sequence the algorithm computes the probability for is given by the default markov order of the algorithm at 4 observations.
For example, security users may want to identify suspicious activity from shell commands. To do so, she can use Sequential Outlier Detection to identify anomalous sequences of command logs executed over time. Each command may seem normal in isolation, but the sequence of commands can be used to identify suspicious activity. This approach to anomaly detection provides more context about the events and time series that are being monitored, improving the ability to detect abnormal sequences among event data.
SPL2 example
The following example uses Sequential Outlier Detection to identify anomalies in event code:
| from splunk_firehose() | eval json=cast(body, "string"), 'json-map'=from_json_object(json), input=ucast(map_get('json-map', "eventCode"), "string", null), time=ucast(map_get('json-map', "timestamp"), "string", null), key=ucast(map_get('json-map', "user"), "string", null), timestamp=cast(time, "long") | detect_sequential_outliers value="input";
Sentiment Analysis (beta) | Stats |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!