dedup
Contents
dedup
Synopsis
Removes the subsequent duplicate results that match specified criteria.
Syntax
dedup [<N>] <field-list> [keepevents=<bool>] [keepempty=<bool>] [consecutive=<bool>] [sortby <sort-by-clause>]
Required arguments
- <field-list>
- Syntax: <string> <string> ...
- Description: A list of field names.
Optional arguments
- consecutive
- Syntax: consecutive=<bool>
- Description: Specify whether to only remove duplicate events that are consecutive (true). Defaults to false.
- keepempty
- Syntax: keepempty=<bool>
- Description: If an event contains a null value for one or more of the specified fields, the event is either retained (true) or discarded. Defaults to false.
- keepevents
- Syntax: keepevents=<bool>
- Description: When true, keeps all events and removes specific values. Defaults to false.
- <N>
- Syntax: <int>
- Description: Specify the first N (where N > 0) number of events to keep, for each combination of values for the specified field(s). The non-option parameter, if it is a number, is interpreted as N.
- <sort-by-clause>
- Syntax: ( - | + ) <sort-field>
- Description: List of fields to sort by and their order, descending ( - ) or ascending ( + ).
Sort field options
- <sort-field>
- Syntax: <field> | auto(<field>) | str(<field>) | ip(<field>) | num(<field>)
- Description: Options for sort-field.
- <field>
- Syntax: <string>
- Description: The name of the field to sort.
- auto
- Syntax: auto(<field>)
- Description: Determine automatically how to sort the field's values.
- ip
- Syntax: ip(<field>)
- Description: Interpret the field's values as an IP address.
- num
- Syntax: num(<field>)
- Description: Treat the field's values as numbers.
- str
- Syntax: str(<field>)
- Description: Order the field's values lexicographically.
Description
The dedup command lets you specify the number of duplicate events to keep based on the values of a field. The event returned for the dedup field will be the first event found (most recent in time). If you specify a number, dedup interprets this number as the count of duplicate events to keep, N. If you don't specify a number, N is assumed to be 1 and it keeps only the first occurring event and removes all consecutive duplicates.
The dedup command also lets you sort by some list of fields. This will remove all the duplicates and then sort the results based on the specified sort-by field. Note, that this will only be valid or effective if your search returns multiple results. The other options let you specify other criteria, for example you may want to keep all events, but for events with duplicate values, remove those values instead of the entire event.
Note: We do not recommend that you run the dedup command against the _raw field if you are searching over a large volume of data. Doing this causes Splunk to add a map of each unique _raw value seen which will impact your search performance. This is expected behavior.
Examples
Example 1: Remove duplicates of results with the same 'host' value.
... | dedup hostExample 2: Remove duplicates of results with the same 'source' value and sort the events by the '_time' field in ascending order.
... | dedup source sortby +_timeExample 3: Remove duplicates of results with the same 'source' value and sort the events by the '_size' field in descending order.
... | dedup source sortby -_sizeExample 4: For events that have the same 'source' value, keep the first 3 that occur and remove all subsequent events.
... | dedup 3 sourceExample 5: For events that have the same 'source' AND 'host' values, keep the first 3 that occur and remove all subsequent events.
... | dedup 3 source hostSee also
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the dedup command.
This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 5.0 , 5.0.1 , 5.0.2 , 5.0.3 View the Article History for its revisions.