union
Description
Merges the results from two or more datasets into one dataset. One of the datasets can be a result set that is then piped into the union
command and merged with a second dataset.
The union
command appends or merges event from the specified datasets, depending on whether the dataset is streaming or non-streaming and where the command is run. The union
command runs on indexers in parallel where possible, and automatically interleaves results on the _time
when processing events. See Usage.
If you are familiar with SQL but new to SPL, see Splunk SPL for SQL users.
Syntax
The required syntax is in bold.
- union
- [<subsearch-options>]
- <dataset>
- [<dataset>...]
Required arguments
- dataset
- Syntax: <dataset-type>:<dataset-name> | <subsearch>
- Description: The dataset that you want to perform the union on. The dataset can be either a named or unnamed dataset.
- A named dataset is comprised of <dataset-type>:<dataset-name>. For <dataset-type> you can specify a data model, a saved search, or an inputlookup. For example
datamodel:"internal_server.splunkdaccess"
. - A subsearch is an unnamed dataset.
- A named dataset is comprised of <dataset-type>:<dataset-name>. For <dataset-type> you can specify a data model, a saved search, or an inputlookup. For example
- When specifying more than one dataset, use a space or a comma separator between the dataset names.
Optional arguments
- subsearch-options
- Syntax: maxtime=<int> maxout=<int> timeout=<int>
- Description: You can specify one set of subsearch-options that apply to all of the subsearches. You can specify one or more of the subsearch-options. These options apply only when the subsearch is treated as a non-streaming search.
- The
maxtime
argument specifies the maximum number of seconds to run the subsearch before finalizing. The default is 60 seconds. - The
maxout
argument specifies the maximum number of results to return from the subsearch. The default is 50000 results. This value is themaxresultrows
setting is in the [searchresults] stanza in thelimits.conf
file. - The
timeout
argument specifies the maximum amount of time, in seconds, to cache the subsearch results. The default is 300 seconds.
- The
Usage
The union
command is a dataset processing command. See Command types.
How the union
command processes datasets depends on whether the dataset is a streaming or non-streaming dataset. The type of dataset is determined by the commands that are used to create the dataset.
See Types of commands.
There are two types of streaming commands, distributable streaming and centralized streaming. For this discussion about the union
command, streaming datasets refers to distributable streaming.
A subsearch can be initiated through a search command such as the union
command. See Initiating subsearches with search commands in the Splunk Cloud Platform Search Manual.
Where the command is run
Whether the datasets are streaming or non-streaming determines if the union
command is run on the indexers or the search head. The following table specifies where the command is run.
Dataset type | Dataset 1 is streaming | Dataset 1 is non-streaming |
---|---|---|
Dataset 2 is streaming | Indexers | Search head |
Dataset 2 is non-streaming | Search head | Search head |
How the command is processed
The type of dataset also determines how the union
command is processed.
Dataset type | Impact on processing |
---|---|
Centralized streaming or non-streaming | Processed as an append command. |
Distributable streaming | Processed as a multisearch command.
|
Optimized syntax for streaming datasets
With streaming datasets, instead of this syntax:
<streaming_dataset1> | union <streaming_dataset2>
Your search is more efficient with this syntax:
... | union <streaming_dataset1>, <streaming_dataset2>
Why unioned results might be truncated
Consider the following search, which uses the union
command to merge the events from three indexes. Each index contains 60,000 events, for a total of 180,000 events.
| union maxout=10000000
[ search index=union_1 ]
[ search index=union_2 ]
[ search index=union_3 ]
| stats count by index
This search produces the following union results:
index | count |
---|---|
union_1 | 60000 |
union_2 | 60000 |
union_3 | 60000 |
In this example, all of the subsearches are distributable streaming, so they are unioned by using same processing as the multisearch
command. All 60,000 results for each index are unioned for a total of 180,000 merged events.
However, if you specify a centralized streaming command, such as the head
command, in one of the subsearches the results change.
| union maxout=10000000
[ search index=union_1 | head 60000]
[ search index=union_2 ]
[ search index=union_3 ]
| stats count by index
This search produces the following union results for a total of 160,000 merged events.
index | count |
---|---|
union_1 | 60000 |
union_2 | 50000 |
union_3 | 50000 |
Because the head
command is a centralized streaming command rather than distributable streaming command, any subsearches that follow the head
command are processed using the append
command. In other words, when a command forces the processing to the search head, all subsequent commands must also be processed on the search head.
Internally, the search is converted to this:
| search index=union_1
| head 60000
| append
[ search index=union_2 ]
| append
[ search index=union_3 ]
| stats count by index
When the union
command is used with commands that are non-streaming commands, the default for the maxout
argument is enforced. The default for the maxout
argument is 50,000 events. In this example, the default for the maxout
argument is enforced starting with the subsearch that used the non-streaming command. The default is enforced for any subsequent subsearches.
If the non-streaming command is on the last subsearch, the first two subsearches are processed as streaming. These subsearches are unioned using the multisearch
command processing. The final subsearch includes a non-streaming command, the head
command. That subsearch gets unioned using the append
command processing.
Internally this search is converted to this:
| multisearch
[ search index=union_1 ]
[ search index=union_2 ]|
| append
[ search index=union_3 | head 60000 ]
| stats count by index
In this example, the default for the maxout
argument applies only to the last subsearch. That subsearch returns only 50,000 events instead of the entire set of 60,000 events. The total number events merged is 170,000. 60,000 events for the first and second subsearches and 50,000 events from the last subsearch.
Interleaving results
When two datasets are retrieved from disk in descending time order, which is the default sort order, the union
command interleaves the results. The interleave is based on the _time
field. For example, you have the following datasets:
dataset_A
_time | host | bytes |
---|---|---|
4 | mailsrv1 | 2412 |
1 | dns15 | 231 |
dataset_B
_time | host | bytes |
---|---|---|
3 | router1 | 23 |
2 | dns12 | 22o |
Both datasets are descending order by _time
. When | union dataset_A, dataset_B
is run, the following dataset is the result.
_time | host | bytes |
---|---|---|
4 | mailsrv1 | 2412 |
3 | router1 | 23 |
2 | dns12 | 22o |
1 | dns15 | 231 |
Examples
1. Union events from two subsearches
The following example merges events from index a
and index b
. New fields type
and mytype
are added in each subsearch using the eval
command.
| union [search index=a | eval type = "foo"] [search index=b | eval mytype = "bar"]
2. Union the results of a subsearch to the results of the main search
The following example appends the current results of the main search with the tabular results of errors from the subsearch.
... | chart count by category1 | union [search error | chart count by category2]
3. Union events from a data model and events from an index
The following example unions a built-in data model that is an internal server log for REST API calls and the events from index a
.
... | union datamodel:"internal_server.splunkdaccess" [search index=a]
4. Specify the subsearch options
The following example sets a maximum of 20,000 results to return from the subsearch. The example specifies to limit the duration of the subsearch to 120 seconds. The example also sets a maximum time of 600 seconds (5 minutes) to cache the subsearch results.
... | chart count by category1 | union maxout=20000 maxtime=120 timeout=600 [search error | chart count by category2]
See also
- Related information
- About subsearches in the Search Manual
- About data models in the Knowledge Manager Manual
- Commands
- search
- inputlookup
typer | uniq |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.2.2406, 8.2.2202, 9.0.2205, 8.2.2112, 8.2.2201, 8.2.2203, 9.0.2208, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403 (latest FedRAMP release)
Feedback submitted, thanks!