join command usage

The join command is a centralized streaming command, which means that rows are processed one by one. If you are joining two large datasets, the join command can consume a lot of resources.

For flexibility and performance, consider using one of the following commands if you do not require join semantics:

lookup command. Use this command when one of the datasets remains static or rarely changes. You can use KV store lookups with SPL2. For example, a file from an external system such as a CSV file.
search command. In the most simple scenarios, you might need to search only for sources using the OR operator and then use the stats command to perform the grouping operation on the events.
stats command. Use to group events by a field and perform a statistical function on the events. For example to determine the average duration of events by host name. To use stats, the field must have a unique identifier.

The simplest join possible looks like this:

<source> | join left=L right=R 
where L.pid = R.pid [<right-dataset>]

This joins the source, or left-side dataset, with the right-side dataset. Rows from each dataset are merged into a single row if the where predicate is satisfied.

If you're familiar with SQL, the above example is shorthand for this:

<left-side dataset> | join left=L right=R 
type=inner max=1 
where L.pid = R.pid <right-side dataset>

One-to-many and many-to-many relationships

To return matches for one-to-many, many-to-one, or many-to-many relationships, include the max argument in your join syntax and set the value to 0. By default max=1, which means that the <dataset> returns only the first result from the <dataset>. Setting the value to a higher number or to 0, which is unlimited, returns multiple results from the <dataset>.

Differences between SPL and SPL2

There are significant differences in the join command between SPL and SPL2.

The SPL2 join command performs very much like a SQL join and has similar syntax to a SQL join. With SPL you are actively encouraged to use other commands instead of the join command because in SPL the join command does not perform like a SQL join.

With SPL2, the only arguments in the syntax that are not required are the <join-options>. Some of the SPL <join-options> are not supported in SPL2. Specifically the usetime, earlier, and overwrite join options are not supported.

SPL2 uses a very different syntax

The syntax for the join command is completely different. You must specify field aliases. Field names are required.

Version	Syntax
SPL	[join-options...][field-list] subsearch
SPL2	join (<join-options>...) left=<left-alias> right=<right-alias> where <left-alias>.<left-field>=<right-alias>.<right-field> [ AND <left-alias>.<left-field>=<right-alias>.<right-field> ]... <right dataset>

Field names to join on can be different

Field names do not have to be renamed so that you can join on the key fields. This example joins the incoming search results with the products dataset.

Version	Example
SPL	rename vid AS vendor_id]
SPL2	... join left=L right=R where L.vendorID=R.vid products

Related answers from Splunk Community

join command usage

One-to-many and many-to-many relationships

Differences between SPL and SPL2

SPL2 uses a very different syntax

Field names to join on can be different

See also

Comments

join command usage

Was this topic useful?