
Use a subsearch
This topic walks you through examples of correlating events with subsearches.
A subsearch is a search with a search pipeline as an argument. Subsearches are contained in square brackets and evaluated first. The result of the subsearch is then used as an argument to the primary, or outer, search. Read "About subsearches" in the Search manual.
Example 1: Without a subsearch
Let's try to find the single most frequent shopper on the Buttercup Games online store and what this customer has purchased.
To do this, search for the customer who accessed the online shop the most.
1. Use the top
command:
sourcetype=access_* status=200 action=purchase | top limit=1 clientip
Limit the top
command to return only one result for the clientip
. To see more than one "top purchasing customer", change this limit value. For more information about usage and syntax, see the "top" command's page in the Search Reference manual.
This search returns one clientip
value, which we'll use to identify our VIP customer.
2. Use the stats
command to count this VIP customer's purchases:
sourcetype=access_* status=200 action=purchase clientip=87.194.216.51 | stats count, dc(productId) by clientip
This search used the count()
function which only returns the total count of purchases for the customer. The dc() function is used to count how many different products he buys.
The drawback to this approach is that you have to run two searches each time you want to build this table. The top purchaser is not likely to be the same person at any given time range.
Example 2: With a subsearch
1. Type or copy/paste the following into the search bar.
sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count, dc(productId), values(productId) by clientip
Here, the subsearch is the segment that is enclosed in square brackets, [ ]. This search, search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip
is the same as Example 1 Step 1, except for the last piped command, | table clientip
Because the top
command returns count
and percent
fields as well, the table
command is used to keep only the clientip
value.
These results should match the previous result, if you run it on the same time range. But, if you change the time range, you might see different results because the top purchasing customer will be different.
Note: The performance of this subsearch depends on how many distinct IP addresses match stats=200 action=purchase
. If there are thousands of distinct IP addresses, the top
command has to keep track of all of them before the top 1 is returned, impacting performance. By default, subsearches return a maximum of 10,000 results and have a maximum runtime of 60 seconds. In large production environments, it is possible that the subsearch in this example will timeout before it completes. The best option is to rewrite the query to limit the number of events the subsearch must process. Alternatively, you can increase the maximum results and maximum runtime parameters.
2. Rename the columns to make the information more understandable.
sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count AS "Total Purchased", dc(productId) AS "Total Products", values(productId) AS "Products ID" by clientip | rename clientip AS "VIP Customer"
What happens when you run the search over different time periods? What if you wanted to find the top product sold and how many people bought it?
Next steps
In the next topic, you'll learn about adding new information to your events using field lookups.
PREVIOUS Use the search language |
NEXT Use field lookups |
This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14
Feedback submitted, thanks!