Search Reference

 


dedup

NOTE - Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.

dedup

Description

Removes the events that contain an identical combination of values for the fields that you specify.

With the dedup command, you can specify the number of duplicate events to keep for each value of a single field, or for each combination of values among several fields. Events returned by dedup are based on search order. For historical searches, the most recent events are searched first. For real-time searches, the first events that are received are search, which are not necessarily the most recent events.

You can specify the number of events with duplicate values, or value combinations, to keep. You can sort the fields. When you sort, the dedup command deduplicates the results based on the specified sort-by fields. Other options enable you to retain events with the duplicate fields removed, or to keep events where the fields specified do not exist in the events.

Note: We do not recommend that you run the dedup command against the _raw field if you are searching over a large volume of data. If you search the _raw field the text of every event in memory is retained, which impacts your search performance. This is expected behavior. This behavior applies to any field with high cardinality and large size.

Syntax

dedup [<N>] <field-list> [keepevents=<bool>] [keepempty=<bool>] [consecutive=<bool>] [sortby <sort-by-clause>]

Required arguments

<field-list>
Syntax: <string> <string> ...
Description: A list of field names.

Optional arguments

consecutive
Syntax: consecutive=<bool>
Description: If true, only remove events with duplicate combinations of values that are consecutive.
Default: false
keepempty
Syntax: keepempty=<bool>
Description: If set to true, keeps every events where one or more of the specified fields is not present (null).
Default: false. All events where any of the selected fields are null are dropped.
The keepempty=true argument keeps every event that does not have one or more of the fields in the field list. To keep N representative events for combinations of field values including null values, use the fillnull command to provide a non-null value for these fields. For example:
...|fillnull value="MISSING" field1 field2 | dedup field1 field2
keepevents
Syntax: keepevents=<bool>
Description: If true, keep all events, but will remove the selected fields from events after the first event containing a particular combination of values.
Default: false. Events are dropped after the first event of each particular combination.
<N>
Syntax: <int>
Description: The dedup command retains multiple events for each combination when you specify N. The number for N must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results.
<sort-by-clause>
Syntax: ( - | + ) <sort-field> [(- | +) <sort_field> ...]
Description: List of the fields to sort by and the sort order. Use the dash symbol ( - ) for descending order and the plus symbol ( + ) for ascending order. You must specify the sort order for each field.

Descriptions for the sort_field options

<sort-field>
Syntax: <field> | auto(<field>) | str(<field>) | ip(<field>) | num(<field>)
Description: The options that you can specify to sort the events.
<field>
Syntax: <string>
Description: The name of the field to sort.
auto
Syntax: auto(<field>)
Description: Determine automatically how to sort the field values.
ip
Syntax: ip(<field>)
Description: Interpret the field values as IP addresses.
num
Syntax: num(<field>)
Description: Interpret the field values as numbers.
str
Syntax: str(<field>)
Description: Order the field values by using the lexicographic order.

Examples

Example 1:

Remove duplicates of results with the same 'host' value.

... | dedup host

Example 2:

Remove duplicates of results with the same 'source' value and sort the events by the '_time' field in ascending order.

... | dedup source sortby +_time

Example 3:

Remove duplicates of results with the same 'source' value and sort the events by the '_size' field in descending order.

... | dedup source sortby -_size

Example 4:

For events that have the same 'source' value, keep the first 3 that occur and remove all subsequent events.

... | dedup 3 source

Example 5:

For events that have the same 'source' AND 'host' values, keep the first 3 that occur and remove all subsequent events.

... | dedup 3 source host

See also

uniq

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the dedup command.

This documentation applies to the following versions of Splunk: 4.1 , 4.1.1 , 4.1.2 , 4.1.3 , 4.1.4 , 4.1.5 , 4.1.6 , 4.1.7 , 4.1.8 , 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 , 4.3.1 , 4.3.2 , 4.3.3 , 4.3.4 , 4.3.5 , 4.3.6 , 4.3.7 , 5.0 , 5.0.1 , 5.0.2 , 5.0.3 , 5.0.4 , 5.0.5 , 5.0.6 , 5.0.7 , 5.0.8 , 5.0.9 , 5.0.10 , 5.0.11 , 5.0.12 , 5.0.13 , 6.0 , 6.0.1 , 6.0.2 , 6.0.3 , 6.0.4 , 6.0.5 , 6.0.6 , 6.0.7 , 6.0.8 , 6.0.9 , 6.1 , 6.1.1 , 6.1.2 , 6.1.3 , 6.1.4 , 6.1.5 , 6.1.6 , 6.1.7 , 6.1.8 , 6.2.0 , 6.2.1 , 6.2.2 , 6.2.3 , 6.2.4 View the Article History for its revisions.


Comments

I carried out a simple test and discovered that to dedup and keep the LATEST event, you want to "| dedup field1 sortby -_time"

Bedgar oneok
December 26, 2014

We need an answer to Landen99's question. Does "| dedup field1 sortby +_time" keep the earliest matching event or the latest matching event?

Bedgar oneok
December 26, 2014

Question on the "sortby" option. Does "sortby" sort the results of the dedup or sort for the dedup? In other words:<br /><br /> | dedup field1 sortby +_time<br /><br />1) the first event for each value is chosen and then these events are sorted by time OR<br />2) the earliest event is chosen for each value?

Landen99
November 26, 2014

You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!