join command overview, syntax, and usage
The SPL2 join
command combines the left-side dataset with the right-side dataset, by using one or more common fields. The left-side dataset is the set of results from a search that is piped into the join
command. The left-side dataset is sometimes referred to as the source data. The right-side dataset can be either a saved dataset or a subsearch.
A maximum of 50000 rows in the right-side dataset can be joined with the left-side dataset. This maximum is set to limit the impact of the join
command on performance and resource consumption.
The simplest join possible looks like this:
<left-dataset> | join left=L right=R where L.pid = R.pid <right-dataset>
This joins the source data from the search pipeline with the right-side dataset. Rows from each dataset are merged into a single row if the where
predicate is satisfied.
Syntax
The required syntax is in bold.
- join
- (<join-options>...)
- left=<left-alias>
- right=<right-alias>
- where <left-alias>.<left-field>=<right-alias>.<right-field>...
- <right-dataset>
Required arguments
- left
- Syntax: left=<left-alias>
- Description: The alias to use with the left-side dataset, the source data, to avoid naming collisions.
- right
- Syntax: right=<right-alias>
- Description: The alias to use with the right-side dataset to avoid naming collisions.
- where
- Syntax: where <left-alias>.<left-field>=<right-alias>.<right-field> ...
- Description: The names of the fields in the left-side dataset and the right-side dataset that you want to join on. You must specify the alias and the field name. For example:
L.host=R.user
. To join on multiple fields, you must specify AND operator between each set of fields. For example:L.host=R.user AND L.clientip=R.clientip
.
- You can specify the aliases and fields in
where
clause on either side of the equal sign. - For example, you can specify:
where <left-alias>.<left-field>=<right-alias>.<right-field>
- or
where <right-alias>.<right-field>=<left-alias>.<left-field>
- right-dataset
- Syntax: <dataset> | [ subsearch ]
- Description: The name of the right-side dataset or the subsearch that you want to use to join with the source data. If you specify a dataset, it must be a dataset that you created or are authorized to use. If you specify a subsearch, it must be enclosed in square brackets. A maximum of 50000 rows in the right-side dataset can be joined with the left-side dataset.
Optional arguments
- join-options
- Syntax: [ type | max ]
- Description: Specify the type of join to perform and the maximum number of rows to join on. You can specify one or more <join-options>.
- type
- Syntax: type=<inner | outer | left>
- Description: Indicates the type of join to perform. The difference between an inner and a left (or outer) join is how the rows are treated in the left-side dataset that do not match any of the rows in the right-side dataset. In both
inner
andleft
joins, rows that match are joined. The results of an inner join do not include rows from the left-side dataset that have no matches in the right-side dataset. The results of aleft
(orouter
) join includes all of the rows in the left-side dataset and only those values in the right-side dataset have matching field values. - Default: inner
- max
- Syntax: max=<int>
- Description: Specifies the maximum number of rows in the right-side dataset that each row in the left-side dataset can join with. The default setting means that 1 row in the right-side dataset can join with just 1 row in the left-side dataset. If set to
max=0
, multiple rows in the right-side dataset join with 1 row in the left-side dataset. - Default: 1
Usage
The join
command is a centralized streaming command, which means that rows are processed one by one. If you are joining two large datasets, the join
command can consume a lot of resources.
For flexibility and performance, consider using one of the following commands if you do not require join semantics:
lookup
command. Use this command when one of the datasets remains static or rarely changes. You can use KV store lookups with SPL2. For example, a file from an external system such as a CSV file.search
command. In the most simple scenarios, you might need to search only for sources using the OR operator and then use thestats
command to perform the grouping operation on the events.stats
command. Use to group events by a field and perform a statistical function on the events. For example to determine the average duration of events by host name. To usestats
, the field must have a unique identifier.
The simplest join possible looks like this:
<source> | join left=L right=R where L.pid = R.pid [<right-dataset>]
This joins the source, or left-side dataset, with the right-side dataset. Rows from each dataset are merged into a single row if the where
predicate is satisfied.
If you're familiar with SQL, the above example is shorthand for this:
<left-side dataset> | join left=L right=R type=inner max=1 where L.pid = R.pid <right-side dataset>
One-to-many and many-to-many relationships
To return matches for one-to-many, many-to-one, or many-to-many relationships, include the max
argument in your join syntax and set the value to 0. By default max=1, which means that the <dataset> returns only the first result from the <dataset>. Setting the value to a higher number or to 0, which is unlimited, returns multiple results from the <dataset>.
Differences between SPL and SPL2
There are significant differences in the join
command between SPL and SPL2.
The SPL2 join
command performs very much like a SQL join and has similar syntax to a SQL join. With SPL you are actively encouraged to use other commands instead of the join
command because in SPL the join
command does not perform like a SQL join.
With SPL2, the only arguments in the syntax that are not required are the <join-options>.
Some of the SPL <join-options> are not supported in SPL2. Specifically the usetime
, earlier
, and overwrite
join options are not supported.
SPL2 uses a very different syntax
The syntax for the join
command is completely different. You must specify field aliases. Field names are required.
Version | Syntax |
---|---|
SPL | [join-options...][field-list] subsearch |
SPL2 | join
(<join-options>...)
|
Field names to join on can be different
Field names do not have to be renamed before you can join on the key fields. This example joins the incoming search results with the products
dataset.
Version | Example |
---|---|
SPL | rename vid AS vendor_id] |
SPL2 | ... join left=L right=R where L.vendor_id=R.vid products |
See also
- join command
- join command examples
into command examples | join command examples |
This documentation applies to the following versions of Splunk® Cloud Services: current
Feedback submitted, thanks!