Splunk® Enterprise

Search Manual

Download manual as PDF

Download topic as PDF

Event sampling

By default, a Splunk search retrieves all events. However in some situations you might want to retrieve a sample set of events, instead of retrieving the entire event set. There are several reasons why you might want to use event sampling.

  • To perform a quick search to ensure the correct events are being returned
  • To determine the characteristics of a large data set without processing every event
  • To test that the data selection, formatting, calculations, and other components of the search are working correctly

For most searches, event sampling can greatly increase search performance without decreasing functionality.

The event sampling ratio

The sampling ratio is the likelihood of any event being included in the sample result set. The formula for the ratio is 1/sample_ratio_value.

For example, if the sample ratio value is 100, each event has a 1 in 100 chance of being included in the result set. The selection of each event is independent of the selection of all another events. It is possible that many events are included from the first 100 events, or none at all.

If a search matches 1,000,000 events when sampling is not used, using a sample ratio value of 100 would result in returning approximately 10,000 events.

If you to rerun a sampling search many times, the exact number of returned results is modeled by a binomial distribution with n=1000000 and p=0.01. This distribution looks like a normal distribution, with the mean=10000 and the standard deviation (stdev)=99.5.

In Splunk Web, the sampling ratio that you specify must be a positive integer that is greater than 1. To disable sampling in Splunk Web, set the ratio to 1.

Set the default sampling ratio

In Splunk Enterprise, set the default sampling ratio by editing the ui-prefs.conf file. The sampling ratio must be a positive integer.

In Splunk Cloud, to change the default sampling ratio, file a Support ticket.

How event sampling works

By default, event sampling is not active. When you run a search, every event that matches your criteria is returned. When you specify a ratio, sampling remains in effect for the active search window. Sampling also remains in effect when you save a search as a report or dashboard panel.

When you specify a ratio value, your value overrides the default value configured for your Splunk deployment and remains in effect until you change it.

If you open a new search window, event sampling is no longer active. However, the last custom ratio that you used appears in the Sampling drop-down.

Commands and functions to avoid with event sampling

Typically, searches that use the transaction, stats, or streamstats commands are not good candidates for sampling.

When you calculate statistics using a sample set of events, the statistical values will not be accurate. To determine the true statistical value, you must scale the value returned with event sampling. And scaling only gives you an approximate true value.

For example, you create a report using this search with event sampling enabled.

... | stats sum(x)

Because you used event sampling, the returned value is not the complete sum of all of the events. It is only the sum of the sample set of events. If the sampling ratio is 100, the true sum is approximately 100 times the value returned by the search.

Statistical calculations that fall into this situation are count, sum, and sumsq.

Other statistics that are difficult to interpret when event sampling is used include:

  • distinct_count
  • earliest
  • latest
  • max
  • min

Specify a sampling ratio

You activate event sampling for a search by specifying a sampling ratio.

1. In Splunk Web, below the Search bar, click No Event Sampling.

2. You can use one of the default ratios or specify a custom ratio.

a. To use one of the default ratios, click the ratio in the Sampling drop-down.
b. To specify a custom ratio, click Custom and type the ratio value. Then click Apply. The ratio value must be a positive integer greater than 1.

Event sampling indicators

There are several indicators in the Search & Reporting App window which show that event sampling is active. After you run a search, the Sampling drop-down appears in the event count line. The label for the Sampling drop-down specifies the ratio that is applied to the search. Additionally, if a sampling ratio is being used, the Jobs drop-down specifies the ratio that is applied to the search.

Event sampling with reports and dashboard panels

You can save a search that uses event sampling as a report or dashboard panel. Use the Save As drop-down to save the search.

When the search is saved as a report, the sampling ratio is used when the report is run.

When the search saved as a dashboard panel, the panel is powered by an inline search. When the dashboard is refreshed, the sampling ratio that was saved with the inline search is used.

If you open a report and add the report to a dashboard panel, you can specify how the panel is powered. You can specify that the panel is powered by the inline search that the report is based on. Or you can specify that the panel is powered by the report itself.

Panels powered by reports
When you view the source for the panel in Simple XML, there is no indication if the report uses event sampling.
Panels powered by inline searches
When you view the source for the panel in Simple XML, if the underlying search uses event sampling there is <sampleRatio> entry. For example:
<event>
  <title>sample events</title>
  <search>
     <query>buttercupgames</query>
     <earliest>@d</earliest>
     <latest>now</latest>
     <sampleRatio>500</sampleRatio>
  </search>
</event>
Accelerated reports
You cannot accelerate reports that are based on event sampling searches. See "Accelerate reports" in the Reporting Manual.
PREVIOUS
Use fields to retrieve events
  NEXT
Retrieve events from indexes

This documentation applies to the following versions of Splunk® Enterprise: 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.5.0, 6.5.1, 6.5.1612 (Splunk Cloud only), 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.6.0, 6.6.1, 6.6.2, 6.6.3


Comments

Steve Schohn
Thank you for the question. There isn't a way currently to specify the sampling ratio using the CLI or REST/API. I have passed on this suggestion to the SPL development team.

Lstewart splunk, Splunker
April 28, 2017

Is it possible to include the sample ratio within the SPL? Would be helpful for querying from command line/REST API.

-Steve Schohn

Sschohn splunk, Splunker
April 28, 2017

The above remarks do not focus on an additional, more common reason to do event sampling. Which is to shorten the time a search runs to test if it is data selection, formatting, calculations and all the rest of Splunk Search Language is working correctly before you tell it to begin across a much larger sample of data. In which case, the warnings about commands to avoid are still valid while you may decide to run in sampling mode instead of more clumsy HEAD statements.

Claw, Splunker
April 25, 2016

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters