Splunk® Enterprise

Search Reference

Download manual as PDF

Download topic as PDF

selfjoin

Description

Join search result rows with other search result rows in the same result set, based on one or more fields that you specify.

Syntax

selfjoin [<selfjoin-options>...] <field-list>

Required arguments

<field-list>
Syntax: <field>...
Description: The field or list of fields to join on.

Optional arguments

<selfjoin-options>
Syntax: overwrite=<bool> | max=<int> | keepsingle=<bool>
Description: Options that control the search result set that is returned. You can specify one or more of these options.

Selfjoin options

keepsingle
Syntax: keepsingle=<bool>
Description: Controls whether or not to retain results that have with a unique value in the join fields. When keepsingle=true search results that have no other results to join with are kept in the output.
Default: false
max
Syntax: max=<int>
Description: Indicates the maximum number of 'other' results to join with each main result. If max=0, there is no limit. This argument sets the maximum for the 'other' results. The maximum number of main results is 100,000.
Default: 1
overwrite
Sytnax: overwrite=<bool>
Description: When overwrite=true, causes fields from the 'other' results to overwrite fields of the main results. The main results are used as the basis for the join.
Default: true

Usage

Self joins are more commonly used with relational database tables. They are used less commonly with event data.

An example of an events usecase is with events that contain information about processes, where each process has a parent process ID. You can use the selfjoin command to correlate information about a process with information about the parent process.

See the Extended example.

Basic example

1: Use a single field to join results

Join the results with itself on the 'id' field.

... | selfjoin id

Extended example

The following example shows how the selfjoin command works against a simple set of results. You can follow along with this example on your own Splunk instance.

This example builds a search incrementally. With each addition to the search, the search is rerun and the impact of the additions are shown in a results table. The values in the _time field change each time you rerun the search. However, in this example the values in the results table are not changed so that we can focus on how the changes to the search impact the results.

1. Start by creating a simple set of 5 results by using the makeresults command.

| makeresults count=5

There are 5 results created, each with the same timestamp.

_time
2018-01-18 14:38:59
2018-01-18 14:38:59
2018-01-18 14:38:59
2018-01-18 14:38:59
2018-01-18 14:38:59

2. To keep better track of each result use the streamstats command to add a field that numbers each result.

| makeresults count=5 | streamstats count as a

The a field is added to the results.

_time a
2018-01-18 14:38:59 1
2018-01-18 14:38:59 2
2018-01-18 14:38:59 3
2018-01-18 14:38:59 4
2018-01-18 14:38:59 5

3. Additionally, use the eval command to change the timestamps to be 60 seconds apart. Different timestamps makes this example more realistic.

| makeresults count=5 | streamstats count as a | eval _time = _time + (60*a)

The minute portion of the timestamp is updated.

_time a
2018-01-18 14:38:59 1
2018-01-18 14:39:59 2
2018-01-18 14:40:59 3
2018-01-18 14:41:59 4
2018-01-18 14:42:59 5

4. Next use the eval command to create a field to use as the field to join the results on.

| makeresults count=5 | streamstats count as a | eval _time = _time + (60*a) | eval joiner="x"

The new field is added.

_time a joiner
2018-01-18 14:38:59 1 x
2018-01-18 14:39:59 2 x
2018-01-18 14:40:59 3 x
2018-01-18 14:41:59 4 x
2018-01-18 14:42:59 5 x

5. Use the eval command to create some fields with data.

An if function is used with a modulo (modulus) operation to add different data to each of the new fields. A modulo operation finds the remainder after the division of one number by another number:

  • The eval b command processes each result and performs a modulo operation. If the remainder of a/2 is 0, put “something” into the field “b”, otherwise put "nada" into field “b”.
  • The eval c command processes each result and performs a modulo operation. If the remainder a/2 is 1, put “something else” into the field “c”, otherwise put nothing (NULL) into field “c”.

| makeresults count=5 | streamstats count as a | eval _time = _time + (60*a) | eval joiner="x" | eval b = if(a%2==0,"something","nada"), c = if(a%2==1,"somethingelse",null())

The new fields are added and the fields are arranged in alphabetical order by field name, except for the _time field.

_time a b c joiner
2018-01-18 14:38:59 1 nada somethingelse x
2018-01-18 14:39:59 2 something x
2018-01-18 14:40:59 3 nada somethingelse x
2018-01-18 14:41:59 4 something x
2018-01-18 14:42:59 5 nada somethingelse x

6. Use the selfjoin command to join the results on the joiner field.

| makeresults count=5 | streamstats count as a | eval _time = _time + (60*a) | eval joiner="x" | eval b = if(a%2==0,"something","nada"), c = if(a%2==1,"somethingelse",null()) | selfjoin joiner

The results are joined.

_time a b c joiner
2018-01-18 14:39:59 2 something somethingelse x
2018-01-18 14:40:59 3 nada somethingelse x
2018-01-18 14:41:59 4 something somethingelse x
2018-01-18 14:42:59 5 nada somethingelse x

7. To understand how the selfjoin command joins the results together, remove the | selfjoin joiner portion of the search. Then modify the search to append the values from the a field to the values in the b and c fields.

| makeresults count=5 | streamstats count as a | eval _time = _time + (60*a) | eval joiner="x" | eval b = if(a%2==0,"something"+a,"nada"+a), c = if(a%2==1,"somethingelse"+a,null()) | selfjoin joiner

The results now have the row number appended to the values in the b and c fields.

_time a b c joiner
2018-01-18 14:38:59 1 nada1 somethingelse1 x
2018-01-18 14:39:59 2 something2 x
2018-01-18 14:40:59 3 nada3 somethingelse3 x
2018-01-18 14:41:59 4 something4 x
2018-01-18 14:42:59 5 nada5 somethingelse5 x

8. Now add the selfjoin command back into the search.

| makeresults count=5 | streamstats count as a | eval _time = _time + (60*a) | eval joiner="x" | eval b = if(a%2==0,"something"+a,"nada"+a), c = if(a%2==1,"somethingelse"+a,null()) | selfjoin joiner

The results of the self join.

_time a b c joiner
2018-01-18 14:39:59 2 something2 somethingelse1 x
2018-01-18 14:40:59 3 nada3 somethingelse3 x
2018-01-18 14:41:59 4 something4 somethingelse3 x
2018-01-18 14:42:59 5 nada5 somethingelse5 x

If there are values for a field in both rows, the last result row, based on the _time value, takes precedence. The joins performed are shown in the following table.

Result row Output Description
1 Row 1 is joined with row 2 and returned as row 2. In field b, the value nada1 is discarded because the value something2 in row 2 takes precedence. In field c, there is no value in row 2. The value somethingelse1 from row 1 is returned.
2 Row 2 is joined with row 3 and returned as row 3. Since row 3 contains values for both field b and field c, the values in row 3 take precedence and the values in row 2 are discarded.
3 Row 3 is joined with row 4 and returned as row 4. In field b, the value nada3 is discarded because the value something4 in row 4 takes precedence. In field c, there is no value in row 4. The value somethingelse3 from row 3 is returned.
4 Row 4 is joined with row 5 and returned as row 5. Since row 5 contains values for both field b and field c, the values in row 5 take precedence and the values in row 4 are discarded.
5 Row 5 has no other row to join with. No additional results are returned.

(Thanks to Splunk user Alacercogitatus for helping with this example.)

See also

join

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has using the selfjoin command.

PREVIOUS
searchtxn
  NEXT
sendemail

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7, 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6, 5.0.7, 5.0.8, 5.0.9, 5.0.10, 5.0.11, 5.0.12, 5.0.13, 5.0.14, 5.0.15, 5.0.16, 5.0.17, 5.0.18, 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.1.11, 6.1.12, 6.1.13, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.2.11, 6.2.12, 6.2.13, 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 7.0.0, 7.0.1


Comments

Pmalcakdoj
Thanks for pointing this out. I have clarified this in the description for "max".

Lstewart splunk, Splunker
January 19, 2018

Alacercogitatus
Thanks for the example. Based on our conversations, I have added an "Extended example" to show, step-by-step what happens with selfjoin.

Lstewart splunk, Splunker
January 19, 2018

Please note that setting max=0 applies only to the number of 'other' results
Overall number of results returned is still limited to 100,000 total, regardless of max=0

Pmalcakdoj
January 16, 2018

Hi there, this is a fully-under-documented command. Here is an example if you'd like to further enrich the command.

| makeresults count=2 | streamstats count as a | eval joiner = "x" |eval b = if(a%2==0,"something",null()), c = if(a%2==1,"somethingelse",null())| selfjoin joiner

Table 1

_time a b c joiner
2017-08-29 08:23:42 1 somethingelse x
2017-08-29 08:23:42 2 something x

Output

_time a b c joiner
2017-08-29 08:24:44 2 something somethingelse x


Note how the "a" field is only the latest value, which is different from transaction (where both values would be present).

Alacercogitatus
August 29, 2017

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters