union command usage

The union command is a generating command. Generating commands fetch information from the datasets, without any transformations.

You can use the union command at the beginning of your search to combine two datasets or later in your search where you can combine the incoming search results with a dataset.

Specifying a dataset

You can declare, or specify, a dataset several different ways. Here are some examples:

Type of declaration	Description	Example
Dataset references	Specifying an existing dataset that is defined in the Metadata Catalog. The datasets in this example are indexes.	`...\| union main, customers, purchases`
Transient	Specifying a SPL subsearch as the dataset. Subsearches are enclosed in square brackets.	`...\| union [search main \| stats count() by host ], [from customers \| stats count() by host]`
Fluent	The search results that are piped into the `union` command are referred to as a fluent dataset. This type of declaration has a `union` command that contains one or more subsearches.	`... <some search criteria> \| union [<subsearch1>], [<subsearch2>]`
Literal	Using literal values that you type in as subsearches. Each subsearch is a dataset. This example shows three separate literal dataset declarations.	`from [{state:"Washington", population:39557045}] \| union [{state:"California", population:753591}, {state:"Oregon", population:4190713}]`
Mixed	Specifying a mixture of the types of declarations. This example begins with a fluent, contains a dataset reference <ds1>, includes a subsearch comprised of SPL syntax <subsearch1>, and then a subsearch comprised of literal values.	`... \| <union ds1, [ <subsearch1> ], [ { "state": "Washington", "population": 39557045 } ]`

Semantics

If all of the datasets that are unioned together are streamable time-series, the union command attempts to interleave the data from all datasets into one globally sorted list of events or metrics. The list is based on the _time field in descending order. Otherwise, the union command returns all the rows from the first dataset, followed by all the rows from the second dataset, and so on.

Interleaving results

When two datasets are retrieved from disk in time descending order, which is the default sort order, the union command interleaves the results. The interleave is based on the _time field. For example, suppose you have the following datasets:

dataset_A

_time	Host	Bytes
4	mailsrv1	2412
1	dns15	231

dataset_B

_time	Host	Bytes
3	router1	23
2	dns12	22o

Both datasets are descending order by _time. When | union dataset_A, dataset_B is run, the following dataset is the result.

_time	Host	Bytes
4	mailsrv1	2412
3	router1	23
2	dns12	22o
1	dns15	231

Related answers from Splunk Community

union command usage

Specifying a dataset

Semantics

Interleaving results

See also

Comments

union command usage

Was this topic useful?