Splunk Cloud

Search Reference

Download manual as PDF

Download topic as PDF

join

Description

Use the join command to combine the results of a subsearch with the results of a main search. One or more of the fields must be common to each result set. You can also combine a search result set to itself using the selfjoin command.

If you are familiar with SQL but new to SPL, see Splunk SPL for SQL users.

Alternative commands

For flexibility and performance, consider using one of the following commands if you do not require join semantics. These commands provide event grouping and correlations using time and geographic location, transactions, subsearches, field lookups, and joins.

Command Use
append To append the results of a subsearch to the results of your current search. The events from both result sets are retained.
  • Use only with historical data. The append command does not produce correct results if used in a real-time search.
  • If you use append to combine the events, use a stats command to group the events in a meaningful way. You cannot use a transaction command after you use an append command.
appendcols Appends the fields of the subsearch results with the input search result fields. The first subsearch result is merged with the first main result, the second subsearch result is merged with the second main result, and so on.
lookup Use when one of the result sets or source files remains static or rarely changes. For example, a file from an external system such as a CSV file.

The look up cannot be a subsearch.

search In the most simple scenarios, you might need to search only for sources using the OR operator and then use a stats or transaction command to perform the grouping operation on the events.
stats To group events by a field and perform a statistical function on the events. For example to determine the average duration of events by host name.
  • To use stats, the field must have a unique identifier.
  • To view the raw event data, use the transaction command instead.
transaction Use transaction in the following situations.
  • To group events by using the eval command with a conditional expression, such as if, case, or match.
  • To group events by using a recycled field value, such as an ID or IP address.
  • To group events by using a pattern, such as a start or end time for the event.
  • To break up groups larger than a certain duration. For example, when a transaction does not explicitly end with a message and you want to specify a maximum span of time after the start of the transaction.
  • To display the raw event data for the grouped events.


For information about when to use a join, see the flowchart in About event grouping and correlation in the Search Manual.

Syntax

join [join-options...] [field-list] subsearch

Required arguments

subsearch
Syntax: "[" subsearch "]"
Description: A secondary search where you specify the source of the events that you want to join to. The subsearch must be enclosed in square brackets. The results of the subsearch should not exceed available memory.
Limitations on the subsearch for the join command are specified in the limits.conf.spec file. The limitations include the maximum subsearch to join against, the maximum search time for the subsearch, and the maximum time to wait for subsearch to fully finish. See Subsearches in the Search Manual.

Optional arguments

field-list
Syntax: <field>, <field>, ...
Description: Specify the fields to use for the join. If no fields are specified, all of the fields that are common to both result sets are used.
Field names must match, not just in name but also in case. You cannot join product_id with product_ID. You must first change the case of the field in the subsearch to match the field in the main search.
join-options
Syntax: type=(inner | outer | left) | usetime=<bool> | earlier=<bool> | overwrite=<bool> | max=<int>
Description: Options to the join command. Use either outer or left to specify a left outer join.

Descriptions for the join-options argument

type
Syntax: type=inner | outer | left
Description: Indicates the type of join to perform. The difference between an inner and a left (or outer) join is how the events are treated in the main search that do not match any of the events in the subsearch. In both inner and left joins, events that match are joined. The results of an inner join do not include events from the main search that have no matches in the subsearch. The results of a left (or outer) join includes all of the events in the main search and only those values in the subsearch have matching field values.
Default: inner
An image that shows two venn diagrams. Each diagram contains two intersecting circles, circle A and circle B. The first diagram is labeled Left Join and circle A is completely shaded, including the portion of the circle where it overlaps with circle B. The second diagram is labeled Inner Join and only the portion of circle A that overlaps with circle B is shaded.
Examples of the data that are included in a join.
usetime
Syntax: usetime=<bool>
Description: A Boolean value that Indicates whether to use time to limit the matches in the subsearch results. Used with the earlier option to limit the subsearch results to matches that are earlier or later than the main search results.
Default: true
earlier
Syntax: earlier=<bool>
Description: If usetime=true and earlier=true, the main search results are matched only against earlier results from the subsearch. If earlier=false, the main search results are matched only against later results from the subsearch. Results that occur at the same time (second) are not eliminated by either value.
Default: true
overwrite
Syntax: overwrite=<bool>
Description: Indicates whether fields from the subresults overwrite the fields from the main results, if the fields have the same field name.
Default: true
max
Syntax: max=<int>
Description: Specifies the maximum number of subsearch results that each main search result can join with. If set to max=0, there is no limit.
Default: 1

Usage

Use the join command when the results of the subsearch are relatively small, for example 50,000 rows or less. To minimize the impact of this command on performance and resource consumption, Splunk software imposes some default limitations on the subsearch. See the subsearch section in the syntax for more information about these limitations.

One-to-many and many-to-many relationships

To return matches for one-to-many, many-to-one, or many-to-many relationships, include the max argument in your join syntax and set the value to 0. By default max=1, which means that the subsearch returns only the first result from the subsearch. Setting the value to a higher number or to 0, which is unlimited, returns multiple results from the subsearch.

Examples

Example 1

Combine the results from a main search with the results from a subsearch search vendors. The result sets are joined on the product_id field, which is common to both sources.

... | join product_id [search vendors]

Example 2

If the field names in the sources do not match, you can rename the field in the subsearch result set. The field in the main search is product_id. The field in the subsearch is pid.

Note: The field names must match in name and in case. You cannot join product_id with product_ID.

... | join product_id [search vendors | rename pid AS product_id]

Example 3

By default, only the first row of the subsearch that matches a row of the main search is returned. To return all of the matching subsearch rows, include the max=<int> argument and set the value to 0. This argument joins each matching subsearch row with the corresponding main search row.

... | join product_id max=0 [search vendors]

Example 4

The dashboards and alerts in the distributed management console shows you performance information about your Splunk deployment. The Resource Usage: Instance dashboard contains a table that shows the machine, number of cores, physical memory capacity, operating system, and CPU architecture.

To display the information in the table, use the following search. This search includes a join command. The search uses the information in the dmc_assets table to look up the instance name and machine name. The search then uses the serverName field to join the information with information from the /services/server/info REST endpoint. The /services/server/info is the URI path to the Splunk REST API endpoint that provides hardware and operating system information for the machine. The $splunk_server$ part of the search is a token variable.

| inputlookup dmc_assets 
| search serverName = $splunk_server$ 
| stats first(serverName) AS serverName, first(host) AS host, first(machine) AS machine
| join type=left serverName 
   [ | rest splunk_server=$splunk_server$ /services/server/info
   | fields serverName, numberOfCores, physicalMemoryMB, os_name, cpu_arch]
| fields machine numberOfCores physicalMemoryMB os_name cpu_arch 
| rename machine AS Machine, numberOfCores AS "Number of Cores", 
  physicalMemoryMB AS "Physical Memory Capacity (MB)", os_name AS "Operating System", 
  cpu_arch AS "CPU Architecture"

See also

selfjoin, append, set, appendcols

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the join command.

PREVIOUS
iplocation
  NEXT
kmeans

This documentation applies to the following versions of Splunk Cloud: 6.5.0, 6.5.1, 6.5.1612, 6.6.0, 6.6.1, 6.6.3


Comments

Is join suppose to flattened MV fields? Take this example
| tstats count by _time span=1d | join _time [ | tstats values(source) as source by _time span=1d]

One would expect to have a mv field called source added to each event joined but instead you get a flattened list concatenated with spaces.

Not sure if this is a bug or undocumented feature :)

Bsonposh, Splunker
November 22, 2017

Rjthibod - Yes, I do think it is worth mentioning the Union command. Thanks for the suggestion!

Lstewart splunk, Splunker
June 2, 2017

Is it worth adding the new `union` command to the Alternative Commands section starting for version 6.6?

Rjthibod
June 1, 2017

Lstewart -- Hm, then is the "usetime" default listed incorrectly? Is the default for usetime actually false? This is the behavior I'm seeing: https://gist.github.com/jordan-brough/b510997732b17046adf5

Jordananimoto
October 5, 2015

Yes, the default value for "earlier" is "true.

Lstewart splunk, Splunker
October 5, 2015

Is "true" really the default value for the "earlier" option? On my installation at least the default seems to be "false", which seems like the more expected behavior.

Jordananimoto
September 25, 2015

The documentation for the join command has been updated based on the feedback we received and the questions posted in Splunk Answers. Thank you!

Lstewart splunk, Splunker
June 11, 2015

@KerdonMike make sure your have this "max=0". for example in your code try

| join type=left max=0

i know that works for inner joins. The thing is Splunk will automatically do a "dedup" on the subquery dataset on the field that is being joined to the original dataset.

Tdiestel
May 3, 2015

And, once there will be functionality for editing my own posts, I will correct my first post from "join command is now working as it should" to "join command is NOT working as it should".

KerdonMike
March 24, 2015

In similar fashion, splunk's "inner join" is not entirely inner join, because it only allows comparing columns of same name, and for equality.
So it is (as described in the linked wiki page) somewhere between equi-join and natural join.

So the used comparison:
"SQL-like joining of results from the results from the main search with the results from a subsearch."

Should be more precise as:
"Nothing like SQL joins, and also bugged compared to our own definition."

I hope the annoyance of working with splunk's join is evident from my posts.

KerdonMike
March 24, 2015

Either the documentation is incorrect, or the join command is now working as it should. Specifically, for the "type=left".

In my example data, if I run query:
"... (some specific code) ..." it returns 1 result / row.

When I add left join:
"... (some specific code) ...
| join type=left usetime=true earlier=false ... (some condition columns and subsearch) ..."

I still get the same 1 result / row, plus some additional computed values from the subsearch.

When I change the left join:
"... (some specific code) ...
| join type=left usetime=true earlier=true ... (some condition columns and subsearch) ..."

I lose my 1 result / row, meaning the parameters "usetime" and "earlier" take precedence over "type".
They make it behave like it's inner join, not left join.

Also, for the writer of this documentation page - "outer join" is not the same as "left join".
You might want to read something about it (e.g. http://en.wikipedia.org/wiki/Join_%28SQL%29).

KerdonMike
March 24, 2015

Remember that a JOIN is case/type sensitive. SO | eval =[upper|lower]() might be useful to pipe results to a JOIN

Jdbtee
August 27, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters