Search Reference

 


anomalies

anomalies

Use the anomalies command to look for events that you don't expect to find based on the values of a field in a sliding set of events. The anomalies command assigns an unexpectedness score to each event in a new field named unexpectedness. Whether the event is considered anomalous or not depends on a threshold value that is compared against the calculated unexpectedness score. The event is considered unexpected or anomalous if the unexpectedness > threshold.

Note: After you run anomalies in the timeline Search view, add the unexpectedness field to your events list using the Pick fields menu.

Synopsis

Computes an unexpectedness score for an event.

Syntax

anomalies [threshold=num] [labelonly=bool] [normalize=bool] [maxvalues=int] [field=field] [blacklist=filename] [blacklistthreshold=num] [by-clause]

Optional arguments

threshold
Datatype: threshold=<num>
Description: A number to represent the unexpectedness limit. If an event's calculated unexpectedness is greater than this limit, the event is considered unexpected or anomalous. Defaults to 0.01.
labelonly
Datatype: labelonly=<bool>
Description: Specify how you want to output to be returned. The unexpectedness field is appended to all events. If set to true, no events are removed. If set to false, events that have a unexpected score less than the threshold (boring events) are removed. Defaults to false.
normalize
Datatype: normalize=<bool>
Description: Specify whether or not to normalize numeric values. For cases where field contains numeric data that should not be normalized, but treated as categories, set normalize=false. Defaults to true.
maxvalues
Datatype: maxvalues=<int>
Description: Specify the size of the sliding window of previous events to include when determining the unexpectedness of an event's field value. This number is between 10 and 10000. Defaults to 100.
field
Datatype: field=<field>
Description: The field to analyze when determining the unexpectedness of an event. Defaults to _raw.
blacklist
Datatype: blacklist=<filename>
Description: A name of a CSV file of events that is located in $SPLUNK_HOME/var/run/splunk/BLACKLIST.csv. Any incoming event that is similar to an event in the blacklist is treated as not anomalous (that is, uninteresting) and given an unexpectedness score of 0.0.
blacklistthreshold
Datatype: blacklistthreshold=<num>
Description: Specify similarity score threshold for matching incoming events to blacklisted events. If the incoming event has a similarity score above the blacklistthreshold, it is marked as unexpected. Defaults to 0.05.
by clause
Syntax: by <fieldlist>
Description: Used to specify a list of fields to segregate results for anomaly detection. For each combination of values for the specified field(s), events with those values are treated entirely separately.

Description

For those interested in how the unexpected score of an event is calculated, the algorithm is proprietary, but roughly speaking, it is based on the similarity of that event (X) to a set of previous events (P):

unexpectedness =  [s(P and X) - s(P)] / [s(P) + s(X)]

Here, s() is a metric of how similar or uniform the data is. This formula provides a measure of how much adding X affects the similarity of the set of events and also normalizes for the differing event sizes.

You can run the anomalies command again on the results of a previous anomalies, to further narrow down the results. As each run operates over 100 events, the second call to anomalies is approximately running over a window of 10,000 previous events.

Examples

Example 1: This example just shows how you can tune the search for anomalies using the threshold value.

index=_internal | anomalies by group | search group=*

This search looks at events in the _internal index and calculates the unexpectedness score for sets of events that have the same group value. This means that the sliding set of events used to calculate the unexpectedness for each unique group value will only include events that have the same group value. The search command is then used to show only events that include the group field. Here's a snapshot of the results:

Ex anomalies.png

With the default threshold=0.01, you can see that some of these events may be very similar. This next search increases the threshold a little:

index=_internal | anomalies threshold=0.03 by group | search group=*

Ex anomalies.2.png

With the higher threshold value, you can see at-a-glance that there is more distinction between each of the events (the timestamps and key/value pairs).

Also, you might not want to hide the events that are not anomalous. Instead, you can add another field to your events that tells you whether or not the event is interesting to you. One way to do this is with the eval command:

index=_internal | anomalies threshold=0.03 labelonly=true by group | search group=* | eval threshold=0.03 | eval score=if(unexpectedness>=threshold, "anomalous", "boring")

This search uses labelonly=true so that the boring events are still retained in the results list. The eval command is used to define a field named threshold and set it to the value. This has to be done explicitly because the threshold attribute of the anomalies command is not a field. The eval command is then used to define another new field, score, that is either "anomalous" or "boring" based on how the unexpectedness compares to the threshold value. Here's a snapshot of these results:

Ex anomalies.3.png

More examples

Example 1: Show most interesting events first, ignoring any in the blacklist 'boringevents'.

... | anomalies blacklist=boringevents | sort -unexpectedness

Example 2: Use with transactions to find regions of time that look unusual.

... | transaction maxpause=2s | anomalies

Example 3: Look for anomalies in each source separately -- a pattern in one source will not affect that it is anomalous in another source.

... | anomalies by source

See also

anomalousvalue, cluster, kmeans, outlier

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the anomalies command.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 , 5.0.3 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!