anomalies command to look for events that you don't expect to find based on the values of a field in a sliding set of events. The anomalies command assigns an unexpectedness score to each event in a new field named
unexpectedness. Whether the event is considered anomalous or not depends on a
threshold value that is compared against the calculated unexpectedness score. The event is considered unexpected or anomalous if the unexpectedness >
Note: After you run
anomalies in the timeline Search view, add the
unexpectedness field to your events list using the Pick fields menu.
Computes an unexpectedness score for an event.
anomalies [threshold=num] [labelonly=bool] [normalize=bool] [maxvalues=int] [field=field] [blacklist=filename] [blacklistthreshold=num] [by-clause]
- Datatype: threshold=<num>
- Description: A number to represent the unexpectedness limit. If an event's calculated unexpectedness is greater than this limit, the event is considered unexpected or anomalous. Defaults to 0.01.
- Datatype: labelonly=<bool>
- Description: Specify how you want to output to be returned. The
unexpectednessfield is appended to all events. If set to true, no events are removed. If set to false, events that have a
unexpectedscore less than the threshold (boring events) are removed. Defaults to false.
- Datatype: normalize=<bool>
- Description: Specify whether or not to normalize numeric values. For cases where
fieldcontains numeric data that should not be normalized, but treated as categories, set
normalize=false. Defaults to true.
- Datatype: maxvalues=<int>
- Description: Specify the size of the sliding window of previous events to include when determining the unexpectedness of an event's field value. This number is between 10 and 10000. Defaults to 100.
- Datatype: field=<field>
- Description: The field to analyze when determining the unexpectedness of an event. Defaults to
- Datatype: blacklist=<filename>
- Description: A name of a CSV file of events that is located in $SPLUNK_HOME/var/run/splunk/BLACKLIST.csv. Any incoming event that is similar to an event in the blacklist is treated as not anomalous (that is, uninteresting) and given an unexpectedness score of 0.0.
- Datatype: blacklistthreshold=<num>
- Description: Specify similarity score threshold for matching incoming events to blacklisted events. If the incoming event has a similarity score above the
blacklistthreshold, it is marked as unexpected. Defaults to 0.05.
- by clause
- Syntax: by <fieldlist>
- Description: Used to specify a list of fields to segregate results for anomaly detection. For each combination of values for the specified field(s), events with those values are treated entirely separately.
For those interested in how the unexpected score of an event is calculated, the algorithm is proprietary, but roughly speaking, it is based on the similarity of that event (X) to a set of previous events (P):
unexpectedness = [s(P and X) - s(P)] / [s(P) + s(X)]
s() is a metric of how similar or uniform the data is. This formula provides a measure of how much adding X affects the similarity of the set of events and also normalizes for the differing event sizes.
You can run the
anomalies command again on the results of a previous
anomalies, to further narrow down the results. As each run operates over 100 events, the second call to
anomalies is approximately running over a window of 10,000 previous events.
Example 1: This example just shows how you can tune the search for anomalies using the
index=_internal | anomalies by group | search group=*
This search looks at events in the
_internal index and calculates the
unexpectedness score for sets of events that have the same
group value. This means that the sliding set of events used to calculate the
unexpectedness for each unique
group value will only include events that have the same
group value. The search command is then used to show only events that include the
group field. Here's a snapshot of the results:
With the default
threshold=0.01, you can see that some of these events may be very similar. This next search increases the
threshold a little:
index=_internal | anomalies threshold=0.03 by group | search group=*
With the higher
threshold value, you can see at-a-glance that there is more distinction between each of the events (the timestamps and key/value pairs).
Also, you might not want to hide the events that are not anomalous. Instead, you can add another field to your events that tells you whether or not the event is interesting to you. One way to do this is with the eval command:
index=_internal | anomalies threshold=0.03 labelonly=true by group | search group=* | eval threshold=0.03 | eval score=if(unexpectedness>=threshold, "anomalous", "boring")
This search uses
labelonly=true so that the boring events are still retained in the results list. The
eval command is used to define a field named
threshold and set it to the value. This has to be done explicitly because the
threshold attribute of the
anomalies command is not a field. The
eval command is then used to define another new field,
score, that is either "anomalous" or "boring" based on how the
unexpectedness compares to the
threshold value. Here's a snapshot of these results:
Example 1: Show most interesting events first, ignoring any in the blacklist 'boringevents'.
... | anomalies blacklist=boringevents | sort -unexpectedness
Example 2: Use with transactions to find regions of time that look unusual.
... | transaction maxpause=2s | anomalies
Example 3: Look for anomalies in each source separately -- a pattern in one source will not affect that it is anomalous in another source.
... | anomalies by source
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the anomalies command.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 , 5.0.3