How subsearches work
Contents
How subsearches work
A subsearch is a search with a search pipeline as an argument. Subsearches are contained in square brackets and evaluated first. The result of the subsearch is then used as an argument in the primary or outer search. Subsearches are mainly used for two purposes:
- Parameterize one search, using the output of another search (for example, find every record from IP addresses that visited some specific URL).
- Run a separate search, but stitch the output to the first search using the
appendcommand.
The following is an example of using a subsearch to parameterize one search. You're interested in finding all events from the most active host in the last hour; but, you can't search for a specific host because it might not be the same host every hour. First, you need to identify which host is most active.
sourcetype=syslog earliest=-1h | top limit=1 host | fields + hostThis search will only return one host value. Once you have this host, which is the most active host in the last hour, you can search for all events on that host:
sourcetype=syslog host=crashyBut, instead of running two searches each time you want this information, you can use a subsearch to give you the hostname and pass it to the outer search:
sourcetype=syslog [search sourcetype=syslog earliest=-1h | top limit=1 host | fields + host]Use subsearch to correlate data
You can use subsearches to correlate data, including data across different indexes or Splunk servers in a distributed environment.
For example, you may have two or more indexes for different application logs. The event data from these logs may share at least one common field. You can use the values of this field to search for events in one index based on a value that is not in another index:
sourcetype=some_sourcetype NOT [search sourcetype=another_sourcetype | fields field_val]Note: This is equivalent to the SQL "NOT IN" functionality:
SELECT * from some_table
WHERE field_value
NOT IN (SELECT field_value FROM another_table)
Change the format of subsearch results
When you use a subsearch, the format command is implicitly applied to your subsearch results. The format command changes your subsearch results into a single linear search string. This is used when you want to pass the returned values in the returned fields into the primary search.
If your subsearch returned a table, such as:
| field1 | field2 |
-------------------
event/row1 | val1_1 | val1_2 |
event/row2 | val2_1 | val2_2 |
The format command returns:
(field1=val1_1 AND field2=val1_2) OR (field1=val2_1 AND field2=val2_2)
For more information, see the format search command reference.
There are a couple of exceptions to this. First, all internal fields (fields that begin with a leading underscore "_*") are ignored and not reformatted in this way. Second, the "search" and "query" fields have their values rendered directly in the reformatted search string.
Using "search"
Generally,"search" can be useful when you need to append some static data or do some eval on the data in your subsearch and then pass it to the primary search. When you use "search", the first value of the field is used as the actual search term. For example, if field2 was "search" (in the table above), the format command returns:
(field1=val1_1 AND val1_2) OR (field1=val2_1 AND val2_2)
You can also use "search" to modify the actual search string that gets passed to the primary search.
Using "query"
"Query" is useful when you are looking for the values in the fields returned from the subsearch, but not in these exact fields. The "query" field behaves similarly to format. Instead of passing the field/value pairs, as you see with format, it passes the values:
(val1_1 AND val1_2) OR (val2_1 AND val2_2)
Examples
Let's say you have the following search, which searches for a clID associated with a specific Name. This value is then used to search for several sources.
index="myindex" [ index="myindex" host="myhost" <Name> | top limit=1 clID | fields + clID ]The subsearch returns something like: ( (clID="0050834ja") )
If you want to return only the value, 0050834ja, run this search:
index=myindex [ index=myindex host=myhost MyName | top limit=1 clID | fields + clID | rename clID as search ]If the field is named search (or query) the field name will be dropped and the subsearch (or technically, the implicit | format command at the end of the subsearch) will drop the field name and return ( ( 0050834ja ) ). Multiple results will return, e.g., ( ( value1 ) OR ( value2 ) OR ( value3 ) ).
This is a special case only when the field is named either "search" or "query". Renaming your fields to anything else will make the subsearch use the new field names.
Performance of subsearches
If your subsearch returns a large table of results, it will impact the performance of your search. You can change the number of results that the format command operates over inline with your search by appending the following to the end of your subsearch: | format maxresults = <integer> . For more information, see the format search command reference.
You can also control the subsearch with settings in limits.conf for the runtime and maximum number of results returned:
[subsearch]
maxout = <integer>
- Maximum number of results to return from a subsearch.
- This number cannot be greater than or equal to 10500.
- Defaults to 100.
maxtime = <integer>
- Maximum number of seconds to run a subsearch before finalizing
- Defaults to 60.
ttl = <integer>
- Time to cache a given subsearch's results.
- Defaults to 300.
After running a search you can click the Actions menu and select "Inspect Search". Scroll down to the remoteSearch component, and you can see what the actual query that resulted from your subsearch. Read more about the "Search Job Inspector" in the Search reference manual.
Answers
Have questions? Visit Splunk Answers and see what questions and answers the Splunk community has about using subsearches.
This documentation applies to the following versions of Splunk: 4.2 , 4.2.1 , 4.2.2 , 4.2.3 , 4.2.4 , 4.2.5 , 4.3 View the Article History for its revisions.