Splunk® Enterprise

User Manual

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

How subsearches work

A subsearch is a search with a search pipeline as an argument. Subsearches are contained in square brackets and evaluated first. The result of the subsearch is then used as an argument in the primary or outer search. Subsearches are mainly used for two purposes:

  • Parameterize one search, using the output of another search (for example, find every record from IP addresses that visited some specific URL).
  • Run a separate search, but stitch the output to the first search using the append command.

The following is an example of using a subsearch to parameterize one search. You're interested in finding all events from the most active host in the last hour; but, you can't search for a specific host because it might not be the same host every hour. First, you need to identify which host is most active.

sourcetype=syslog earliest=-1h | top limit=1 host | fields + host

This search will only return one host value. In this example, the result is the host named "crashy". Now, you can search for all the events coming from "crashy":

sourcetype=syslog host=crashy

But, instead of running two searches each time you want this information, you can use a subsearch to give you the hostname and pass it to the outer search:

sourcetype=syslog [search sourcetype=syslog earliest=-1h | top limit=1 host | fields + host]

Use subsearch to correlate data

You can use subsearches to correlate data, including data across different indexes or Splunk servers in a distributed environment.

For example, you may have two or more indexes for different application logs. The event data from these logs may share at least one common field. You can use the values of this field to search for events in one index based on a value that is not in another index:

sourcetype=some_sourcetype NOT [search sourcetype=another_sourcetype | fields field_val]

Note: This is equivalent to the SQL "NOT IN" functionality:

SELECT * from some_table
WHERE field_value
NOT IN (SELECT field_value FROM another_table)

Change the format of subsearch results

When you use a subsearch, the format command is implicitly applied to your subsearch results. The format command changes your subsearch results into a single linear search string. This is used when you want to pass the returned values in the returned fields into the primary search.

If your subsearch returned a table, such as:

           | field1  | field2  |
            -------------------
event/row1 | val1_1  | val1_2  |
event/row2 | val2_1  | val2_2  | 

The format command returns:

(field1=val1_1 AND field2=val1_2) OR (field1=val2_1 AND field2=val2_2)  

For more information, see the format search command reference.

There are a couple of exceptions to this. First, all internal fields (fields that begin with a leading underscore "_*") are ignored and not reformatted in this way. Second, the "search" and "query" fields have their values rendered directly in the reformatted search string.

Using "search"

Generally,"search" can be useful when you need to append some static data or do some eval on the data in your subsearch and then pass it to the primary search. When you use "search", the first value of the field is used as the actual search term. For example, if field2 was "search" (in the table above), the format command returns:

(field1=val1_1 AND val1_2) OR (field1=val2_1 AND val2_2)

You can also use "search" to modify the actual search string that gets passed to the primary search.

Using "query"

"Query" is useful when you are looking for the values in the fields returned from the subsearch, but not in these exact fields. The "query" field behaves similarly to format. Instead of passing the field/value pairs, as you see with format, it passes the values:

(val1_1 AND val1_2) OR (val2_1 AND val2_2) 

Examples

Let's say you have the following search, which searches for a clID associated with a specific Name. This value is then used to search for several sources.

index="myindex" [ search index="myindex" host="myhost" <Name> | top limit=1 clID | fields + clID ]

The subsearch returns something like: ( (clID="0050834ja") )

If you want to return only the value, 0050834ja, run this search:

index=myindex [ search index=myindex host=myhost MyName | top limit=1 clID | fields + clID | rename clID as search ]

If the field is named search (or query) the field name will be dropped and the subsearch (or technically, the implicit | format command at the end of the subsearch) will drop the field name and return ( ( 0050834ja ) ). Multiple results will return, e.g., ( ( value1 ) OR ( value2 ) OR ( value3 ) ).

This is a special case only when the field is named either "search" or "query". Renaming your fields to anything else will make the subsearch use the new field names.

Performance of subsearches

If your subsearch returns a large table of results, it will impact the performance of your search. You can change the number of results that the format command operates over inline with your search by appending the following to the end of your subsearch: | format maxresults = <integer> . For more information, see the format search command reference.

You can also control the subsearch with settings in limits.conf for the runtime and maximum number of results returned:

[subsearch]
maxout = <integer>

  • Maximum number of results to return from a subsearch.
  • This number cannot be greater than or equal to 10500.
  • Defaults to 100.

maxtime = <integer>

  • Maximum number of seconds to run a subsearch before finalizing
  • Defaults to 60.

ttl = <integer>

  • Time to cache a given subsearch's results.
  • Defaults to 300.

After running a search you can click the Actions menu and select "Inspect Search". Scroll down to the remoteSearch component, and you can see what the actual query that resulted from your subsearch. Read more about the "Search Job Inspector" in the Search reference manual.

Result output settings for subsearch commands

Limits.conf.spec indicates that subsearches return a maximum of 100 results by default. But you will see variations in the actual number of output results, because every command can change what the default maxout is for subsearches that belong to that command. Additionally, the default maxout only applies to subsearches that are intended to be expanded into a search expression, which is not the case for some commands, such as join and append. For example, the append command overrides that default to be the value of maxresultrows, unless the user has specified a maxout as a argument to append.

Answers

Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has about using subsearches.

PREVIOUS
About the search language
  NEXT
Create and use search macros

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7


Comments

Why maximum number of results to return from a subsearch cannot be greater than or equal to 10500? What caused such a restriction?

Nikita Danilov
May 23, 2014

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters