join
Description
Use the join
command to combine the results of a subsearch with the results of a main search. One or more of the fields must be common to each result set. You can also combine a search result set to itself using the selfjoin
command.
If you are familiar with SQL but new to SPL, see Splunk SPL for SQL users.
Alternative commands
For flexibility and performance, consider using one of the following commands if you do not require join semantics. These commands provide event grouping and correlations using time and geographic location, transactions, subsearches, field lookups, and joins.
Command | Use |
---|---|
append |
To append the results of a subsearch to the results of your current search. The events from both result sets are retained.
|
appendcols |
Appends the fields of the subsearch results with the input search result fields. The first subsearch result is merged with the first main result, the second subsearch result is merged with the second main result, and so on. |
lookup |
Use when one of the result sets or source files remains static or rarely changes. For example, a file from an external system such as a CSV file.
The lookup cannot be a subsearch. |
search |
In the most simple scenarios, you might need to search only for sources using the OR operator and then use a stats or transaction command to perform the grouping operation on the events.
|
stats |
To group events by a field and perform a statistical function on the events. For example to determine the average duration of events by host name.
|
transaction
|
Use transaction in the following situations.
|
For information about when to use a join, see the flowchart in About event grouping and correlation in the Search Manual.
Syntax
The required syntax is in bold.
- join
- [<join-options>...]
- [<field-list>]
- <subsearch>
Required arguments
- subsearch
- Syntax: "[" <subsearch> "]"
- Description: A secondary search where you specify the source of the events that you want to join to. The subsearch must be enclosed in square brackets. The results of the subsearch should not exceed available memory.
- Limitations on the subsearch for the
join
command are specified in thelimits.conf.spec
file. The limitations include a maximum of 50,000 rows in the subsearch to join against, the maximum search time for the subsearch, and the maximum time to wait for subsearch to fully finish. See Subsearches in the Search Manual.
Optional arguments
- field-list
- Syntax: <field>, <field>, ...
- Description: Specify the fields to use for the join. If no fields are specified, all of the fields that are common to both result sets are used.
- Field names must match, not just in name but also in case. You cannot join
product_id
withproduct_ID
. You must first change the case of the field in the subsearch to match the field in the main search.
- join-options
- Syntax: type=(inner | outer | left) | usetime=<bool> | earlier=<bool> | overwrite=<bool> | max=<int>
- Description: Options to the join command. Use either
outer
orleft
to specify a left outer join.
Descriptions for the join-options argument
- type
- Syntax: type=inner | outer | left
- Description: Indicates the type of join to perform. The difference between an
inner
and aleft
(orouter
) join is how the events are treated in the main search that do not match any of the events in the subsearch. In both inner and left joins, events that match are joined. The results of aninner
join do not include events from the main search that have no matches in the subsearch. The results of aleft
(orouter
) join includes all of the events in the main search and only those values in the subsearch have matching field values. - Default:
inner
- usetime
- Syntax: usetime=<bool>
- Description: A Boolean value that Indicates whether to use time to limit the matches in the subsearch results. Used with the
earlier
option to limit the subsearch results to matches that are earlier or later than the main search results. - Default:
false
- earlier
- Syntax: earlier=<bool>
- Description: If
usetime=true
andearlier=true
, the main search results are matched only against earlier results from the subsearch. Ifearlier=false
, the main search results are matched only against later results from the subsearch. Results that occur at the same time (second) are not eliminated by either value. - Default:
true
- overwrite
- Syntax: overwrite=<bool>
- Description:If fields in the main search results and subsearch results have the same name, indicates whether fields from the subsearch results overwrite the fields from the main search results.
- Default:
true
- max
- Syntax: max=<int>
- Description: Specifies the maximum number of subsearch results that each main search result can join with. If set to
max=0
, there is no limit. - Default:
1
Usage
The join
command is a centralized streaming command when there is a defined set of fields to join to. Otherwise the command is a dataset processing command.
See Command types.
Use the join
command when the results of the subsearch are relatively small, for example 50,000 rows or less. To minimize the impact of this command on performance and resource consumption, Splunk software imposes some default limitations on the subsearch. See the subsearch section in the syntax for more information about these limitations.
One-to-many and many-to-many relationships
To return matches for one-to-many, many-to-one, or many-to-many relationships, include the max
argument in your join
syntax and set the value to 0. By default max=1
, which means that the subsearch returns only the first result from the subsearch. Setting the value to a higher number or to 0, which is unlimited, returns multiple results from the subsearch.
Examples
1. A basic join
Combine the results from a main search with the results from a subsearch search vendors
. The result sets are joined on the product_id
field, which is common to both sources.
... | join product_id [search vendors]
2. Joining on fields with different names
If the field names in the sources do not match, you can rename the field in the subsearch result set. The field in the main search is product_id
. The field in the subsearch is pid
.
Note: The field names must match in name and in case. You cannot join product_id
with pid
or product_ID
.
... | join product_id [search vendors | rename pid AS product_id]
3. Returning all subsearch rows
By default, only the first row of the subsearch that matches a row of the main search is returned. To return all of the matching subsearch rows, include the max=<int>
argument and set the value to 0. This argument joins each matching subsearch row with the corresponding main search row.
... | join product_id max=0 [search vendors]
4. Using a join to display resource usage information
The dashboards and alerts in the distributed management console shows you performance information about your Splunk deployment. The Resource Usage: Instance dashboard contains a table that shows the machine, number of cores, physical memory capacity, operating system, and CPU architecture.
To display the information in the table, use the following search. This search includes the join
command. The search uses the information in the dmc_assets table to look up the instance name and machine name. The search then uses the serverName
field to join the information with information from the /services/server/info
REST endpoint. The /services/server/info
is the URI path to the Splunk REST API endpoint that provides hardware and operating system information for the machine. The $splunk_server$
part of the search is a dashboard token variable.
| inputlookup dmc_assets | search serverName = $splunk_server$ | stats first(serverName) AS serverName, first(host) AS host, first(machine) AS machine | join type=left serverName [ | rest splunk_server=$splunk_server$ /services/server/info | fields serverName, numberOfCores, physicalMemoryMB, os_name, cpu_arch] | fields machine numberOfCores physicalMemoryMB os_name cpu_arch | rename machine AS Machine, numberOfCores AS "Number of Cores", physicalMemoryMB AS "Physical Memory Capacity (MB)", os_name AS "Operating System", cpu_arch AS "CPU Architecture"
See also
iplocation | kmeans |
This documentation applies to the following versions of Splunk® Enterprise: 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.0.6, 7.0.7, 7.0.8, 7.0.9, 7.0.10, 7.0.11, 7.0.13, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10
Feedback submitted, thanks!