dedup command overview, syntax, and usage
The SPL2 dedup
command removes the events that contain an identical combination of values for the fields that you specify.
With the SPL2 dedup
command, you can specify the number of duplicate events to keep for each value of a single field, or for each combination of values among several fields.
Events returned by the dedup
command are based on search order. For historical searches, the most recent events are searched first. For real-time searches, the first events that are received are searched, which are not necessarily the most recent events.
How the SPL2 dedup command works
Suppose that you have the following search results:
host | action | client_ip |
---|---|---|
www1 | view | 211.166.11.101 |
www2 | addtocart | 194.215.205.19 |
www3 | view | 74.53.23.135 |
www1 | addtocart | 128.241.220.82 |
www1 | purchase | 64.66.0.20 |
www3 | view | 107.3.146.207 |
www2 | remove | 194.215.205.19 |
You want to remove search results where the host is a duplicate value.
... | dedup host
The results show the unique host names.
host | action | client_ip |
---|---|---|
www1 | view | 211.166.11.101 |
www2 | addtocart | 194.215.205.19 |
www3 | view | 74.53.23.135 |
This example returns only one result for each host value.
You can specify more than one field with the SPL2 dedup
command. For example:
... | dedup host, client_ip
For each combination of host name and client IP address, duplicate results are removed.
host | action | client_ip |
---|---|---|
www1 | view | 211.166.11.101 |
www2 | addtocart | 194.215.205.19 |
www3 | view | 74.53.23.135 |
www1 | addtocart | 128.241.220.82 |
www1 | purchase | 64.66.0.20 |
www3 | view | 107.3.146.207 |
In this example, the result with host=www2
and client_ip=194.215.205.19
is removed.
Syntax
The required syntax is in bold.
- dedup
- [<int>]
- [keepempty=<bool>]
- [consecutive=<bool>]
- <field-list>
Required arguments
- <field-list>
- Syntax: <field> ["," <field>] ...
- Description: A list of comma-separated field names to remove duplicate values from. You must specify at least one field.
Optional arguments
- consecutive
- Syntax: consecutive=<boolean>
- Description: If set to
true
, removes only events with duplicate combinations of values that are consecutive. - Default: false
- <int>
- Syntax: <int>
- Description: The
dedup
command retains multiple events for each combination when you specify <int>. The number for <int> must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results. - Default: 1
- keepempty
- Syntax: keepempty=<boolean>
- Description: If set to true, keeps every event where one or more of the specified fields is not present (null).
- Default: false. All events where any of the selected fields are null are dropped.
- The
keepempty=true
argument keeps every event that does not have one or more of the fields in the <field-list>.
Usage
The following sections contain important information about using the dedup
command.
Deduplicating large volumes of data
Avoid using the dedup
command on the _raw
field if you are searching over a large volume of data. If you search the _raw
field, the text of every event in memory is retained which impacts your search performance. This is expected behavior. This performance behavior also applies to any field with high cardinality and large size.
Differences between SPL and SPL2
Command options must be specified first
In SPL2, command options must be specified before the <field-list>.
Version | Example |
---|---|
SPL | ... dedup host source 2 |
SPL2 | ... dedup 2 host, source |
List of fields must be comma-delimited
In SPL2, the list of fields must be comma-delimited. Otherwise a parsing error is returned.
Version | Example |
---|---|
SPL | ... dedup host source |
SPL2 | ... dedup host, source |
The sortby argument is not supported
The sortby
argument is not supported in SPL2. Use the sort
command before the dedup
command if you want to change the order of the events, which dictates which event is kept when the dedup
command is run.
Version | Example |
---|---|
SPL | ... dedup host source sortby -_size |
SPL2 | ... sort -_size | dedup host, source |
Alternative: If you are using the from
command, you can specify the ORDER BY
clause instead of using the sort
command.
The keepevents argument is not supported
The keepevents=<boolean>
argument is not supported in SPL2.
Version | Example |
---|---|
SPL | ... dedup host keepevents=true |
SPL2 | Not supported |
See also
- dedup command
- dedup command examples
branch command examples | dedup command examples |
This documentation applies to the following versions of Splunk® Cloud Services: current
Feedback submitted, thanks!