Built-in optimization
The Splunk software includes built-in optimizations that analyze and process your searches for maximum efficiency.
One goal of these optimizations is to filter results as early as possible. Filtering reduces the amount of data to process for the search. Early filtering avoids unnecessary lookups and evaluation calculations for events that are not part of the final search results.
Another goal is to process as much as possible in parallel on the indexers. The built-in optimizations can reorder search processing, so that as many commands as possible are run in parallel on the indexers before sending the search results to the search head for final processing. You can see the reordered search using the Job Inspector. See Analyze search optimizations.
Predicate optimization
Predicates act as filters to remove unnecessary events as your search is processed. The earlier in the search pipeline that a predicate is applied, the more efficient is the use of system resources. If a predicate is applied early, when events are fetched, only the events that match the predicate are fetched. If the same predicate is applied later in the search, then resources are wasted fetching more events than are necessary.
There are several built-in optimizations that focus on predicates:
- Predicate merge
- Predicate pushdown
- Predicate splitting
Types of predicates
There are two types of predicates, simple predicates and complex predicates.
- A simple predicate is a single conditional of the form
field = value
. - A complex predicate is a combination of several conjunctions (AND and OR) and disjunctions (NOT). For example,
Field1 = Value1 OR Field2 = Value2 AND Field3 = Value3
.
A complex predicate can be merged or split apart to optimize a search.
In Splunk SPL, there are two commands that perform predicate-based filtering, where
and search
.
An example of using the where
command for filtering is:
index="_internal" | where bytes > 10
An example of using the search
command for filtering is:
index="_internal" | search bytes > 10
Search-based predicates are a subset of where-based predicates. In other words, search-based predicates can be replaced by where-based predicates. However, where-based predicates cannot be replaced by search-based predicates.
Predicate merge
The predicate merge optimization takes two predicates and merges the predicates into one predicate. For example, consider the following search:
index=main | search a > 10 AND fieldA = "New"
There are two search commands in this example. The search
command before the pipe, which is implied at the beginning of every search, and the explicit search
command after the first pipe. With the predicate merge optimization, the predicates in the second search
command are merged with the predicates in the first search
command. For example:
index=main AND a > 10 AND fieldA = "New"
In some cases, predicates used with the where
command can also be merged. For example, consider the following search:
index=main | where fieldA = "New"
With the predicate merge optimization, the predicates specified with the where
command are merged with the predicates specified with the search
command.
index=main AND fieldA = "New"
Issues with field extractions and predicate merge
An inline field extraction requires special handling if the regular expression pattern extracts a subtoken. The field must be set to indexed=false
in the fields.conf
file. See Example inline field extraction configurations in the Knowledge Manager Manual.
Consider the following sample event:
Mon Apr 17 16:08:16 2017 host=10.10.1.1 Login name=John SUCCESS FRANCE
You create an extracted field called country
that uses the following regular expression:
SUCCESS\s+?\S{3}
SUCCESS
matches the characters SUCCESS literally and is case sensitive.\s
matches any whitespace character (space, tab, new line).+?
is a quantifier that matches between one and unlimited times, the fewest needed to match the pattern.\S
matches any non-whitespace character.{3}
is a quantifier that matches exactly 3 non-whitespace characters
For the sample event, the following regular expression extracts the first 3 characters of the word FRANCE or FRA
. The extraction FRA
is a subtoken of the indexed term FRANCE.
When you search using the extracted field, for example:
index=main | search country=FRA
The search is optimized, using the predicate merge optimizer:
index=main country=FRA
However, the search returns no results because FRA
is not part of the index. FRANCE
is the indexed term.
To overcome this issue, you must add the following setting to the fields.conf
file:
[country] indexed=false
Alternatively, you can disable the built-in optimizations. See Optimization settings.
Predicate pushdown
The predicate pushdown optimization analyzes a search and reorders the search processing so that predicates are processed as early as possible.
Example of a simple predicate pushdown
You perform the following search:
sourcetype=access* (status=401 OR status=403) | lookup usertogroup user OUTPUT group | where src_category="email_server"
The sourcetype=access* (status=401 OR status=403)
portion of the search retrieves 50,000 events. The lookup
is performed on all 50,000 events. Then the where
command is applied and filters out events that are not src_category="email_server"
. The result is that 45,000 events are discarded and 5,000 events are returned in the search results.
If the search criteria used with the where
command is applied before the lookup
, more events are filtered out of the results. Only 5,000 events are retrieved from disk before the lookup
is performed.
With predicate pushdown, the search is reordered for processing. The where
command is eliminated by moving the search criteria src_category="email_server"
before the first pipe.
sourcetype=access* (status=401 OR status=403) src_category="email_server" | lookup usertogroup user OUTPUT group
Example of a complex predicate pushdown
Consider the following search fragment:
... | eval x=if(isnull(x) OR x=="", "missing", x) | where x = "hello"
In this situation, the eval
command is assigning a default value when a field is null or empty. This is a common pattern in Common Information Model (CIM) data models.
The built-in optimization reorganizes the search criteria before processing the search. The where
command is moved before the eval
command.
... | where x = "hello" | eval x=if(isnull(x) OR x=="", "missing", x)
Predicate splitting
Predicate splitting is the action of taking a predicate and dividing, or splitting, the predicate into smaller predicates. The predicate splitting optimizer can then move the smaller predicates, when possible, to an earlier place in the search.
Consider the following search:
index="_internal" component = "SearchProcess" | eval a = (x + y) | where a > 200 AND x < 10
Because field a
is generated as part of the eval
command, it cannot be processed earlier in the search. However, field x
exists in the events and can be processed earlier. The predicate splitting optimization separates the components of the where
portion of the search. The search is reordered to process eligible components earlier.
index="_internal" component = "SearchProcess" | where x < 10 | eval a = (x + y) | where a > 200
The portion of the where
command x < 10
is moved to before the eval
command. This move reduces the number of results that the eval
command must process.
Projection elimination
This optimization analyzes your search and determines if any of the generated fields specified in the search are not used to produce the search results. If generated fields are identified that can be eliminated, an optimized version of the search is run. Your search syntax remains unchanged.
For example, consider the following search:
index=_internal | eval c = x * y / z | stats count BY a, b
The c
field that is generated by eval c = x * y / z
is not used in the stats
calculation. The c
field can be eliminated from the search.
Your search syntax remains:
index=_internal | eval c = x * y / z | stats count BY a, b
But the optimized search that is run is:
index=_internal | stats count BY a, b
Here is another example:
| tstats count FROM datamodel=internal_audit_logs
For buckets whose data models are not built, this expands to the following fallback search:
| search ((index=* OR index=_*) index=_audit) | eval nodename = "Audit" | eval is_searches=if(searchmatch("(action=search NOT dmauditsearch)"),1,0), is_not_searches=1-is_searches, is_modify=if(searchmatch("(action=edit_user OR action=edit_roles OR action=update)"),1,0), is_not_modify=1-is_modify | eval nodename = if(nodename == "Audit" AND searchmatch("(action=search NOT dmauditsearch)"), mvappend(nodename, "Audit.searches"), nodename) | eval is_realtime=case(is_realtime == 0, "false", is_realtime == 1, "true", is_realtime == "N/A", "false"), search_id=replace(search_id,"'",""), search=replace(search,"'",""), search_type=case((id LIKE "DM_%" OR savedsearch_name LIKE "_ACCELERATE_DM%"), "dm_acceleration", search_id LIKE "scheduler%", "scheduled", search_id LIKE "rt%", "realtime", search_id LIKE "subsearch%", "subsearch", (search_id LIKE "SummaryDirector%" OR search_id LIKE "summarize_SummaryDirector%"), "summary_director", 1=1, "adhoc") | rename is_realtime AS Audit.searches.is_realtime search_id AS Audit.searches.search_id search AS Audit.searches.search search_type AS Audit.searches.search_type | eval nodename = if(nodename == "Audit" AND searchmatch("(action=edit_user OR action=edit_roles OR action=update)"), mvappend(nodename, "Audit.modify"), nodename) | rename action AS Audit.action info AS Audit.info object AS Audit.object operation AS Audit.operation path AS Audit.path user AS Audit.user exec_time AS Audit.exec_time result_count AS Audit.result_count savedsearch_name AS Audit.savedsearch_name scan_count AS Audit.scan_count total_run_time AS Audit.total_run_time is_searches AS Audit.is_searches is_not_searches AS Audit.is_not_searches is_modify AS Audit.is_modify is_not_modify AS Audit.is_not_modify | addinfo type=count label=prereport_events | stats count
This search can be optimized to this syntax:
| search ((index=* OR index=_*) index=_audit) | stats count
Eliminating unnecessary generated fields, or projections, leads to better search performance.
Stats/chart/timechart to tstats optimization
This optimization converts qualifying stats
, chart
, or timechart
searches into tstats
searches. When the search processor applies this optimization to qualifying stats
, chart
, or timechart
searches that aggregate large amounts of data, such searches can complete far faster than they otherwise would. This increase in efficiency happens because tstats
performs statistical searches on indexed fields in tsidx index summary files rather than the raw data.
For example, consider this search:
index=_internal sourcetype=splunkd | stats count
The stats
/chart
/timechart
to tstats
optimizer optimizes that search to something like this:
| tstats prestats=true summariesonly=false allow_old_summaries=false include_reduced_buckets=true count WHERE (index=_internal sourcetype=splunkd) | stats count
How searches qualify for stats/chart/timechart to tstats optimization
The search optimizer applies stats
/chart
/timechart
to tstats
optimization only to stats
, chart
, and timechart
searches that meet an array of conditions.
Qualification for stats/chart/timechart to tstats search optimization | More information |
---|---|
The search must have an explicit or implicit search command followed by a stats , chart , or timechart command.
|
Here is an example of a search that fits this pattern:
|
The search portion of the search string and the stats , chart , or timechart portion of the search string reference only indexed fields.
|
The tstats command can perform searches only on indexed fields.For more information about indexed fields and search-time fields, see When Splunk software extracts fields in the Knowledge Manager Manual. |
The search does not have search-time field extractions that overwrite the indexed fields in the search and the stats , chart , or timechart portions of the search.
|
For example, a search with a search-time field extraction for a calculated field that overwrites the host field would not be optimized.At run time, the optimizer reviews all possible configurations to which the user has access. If any of these configurations might lead to overwriting of indexed fields with search-time fields, the optimizer disqualifies the search for tstats optimization.See When Splunk software extracts fields in the Knowledge Manager Manual. |
The search uses only statistical or charting functions that are supported by the tstats command.
|
For example, the earliest , latest , earliest_time , and latest_time functions are not supported by tstats . If the search uses those functions in conjunction with the stats , chart , or timechart commands, it is disqualified for tstats optimization.See the tstats topic. |
The search filters out source types that invoke multikv to extract values from table-formatted events.
|
These are source types that are configured with KV_MODE=multi in the Advanced tab of the Edit Source Types dialog box.If a search does not explicitly filter out source types, the optimizer then checks all source types available to the search. If the optimizer finds any source types that are configured with KV_MODE=multi , it does not apply stats /chart /timechart to tstats optimization to the search.See Manage source types in the Getting Data In manual. |
The search does not invoke verbose mode. | See Search modes. |
The search is not a real-time search. | See About real-time searches and reports. |
The search does not require subsecond timestamp resolution. | Searches that require result timestamps to have millisecond or smaller time resolution are disqualified. See About searching with time. |
The search does not use event sampling. | See Event sampling. |
The search does not include subsearches. | See About subsearches. |
The search is not running in an environment that has been configured to rename source types at search time. | See Rename source types in the Getting Data In manual. |
The search is not running in an environment where status_buckets > 0 .
|
A positive value for the status_buckets setting causes timelines to be generated by stats searches, which disqualifies them for tstats optimization. The status_buckets setting can be set through a POST operation to the search/jobs REST API endpoint, or through a direct change to the status_buckets setting in limits.conf .See the Search endpoint descriptions in the REST API Reference Manual. |
The search does not run over virtual indexes. | Does not apply to Splunk Cloud Platform. Virtual indexes are used only by Splunk Analytics for Hadoop. See Search a virtual index in the Splunk Analytics for Hadoop manual. |
Disabling the stats/chart/timechart to tstats optimizer on a per-search basis
The stats
/chart
/timechart
to tstats
optimizer can cause certain kinds of searches to run slower.
For example, this optimizer does not cooperate with searches that populate summary indexes, because it does not respect the fields created by summary indexing commands. This optimizer may cause searches that perform stats
, chart
, or timechart
operations on summary indexes to perform extra work on a fallback search, thus slowing their performance.
You can disable this optimization for individual searches.
If you want to run a stats
search without the optimization, append | noop search_optimization.replace_stats_cmds_with_tstats=f
to the search string.
If you want to run a chart
or timechart
search without the optimization, append | noop search_optimization.replace_chart_cmds_with_tstats=f
to the search string.
Event types and tags optimization
Event typing and tagging can take a significant amount of search processing time. This is especially true with Splunk Enterprise Security where there are many event types and tags defined. The more event types and tags that are defined in the system, the greater the annotation costs.
The built-in event types and tags optimization applies to transforming searches and data model acceleration searches.
Transforming searches
When a transforming search is run without the built-in optimization, all of the event types and tags are uploaded from the configuration system to the search processor. This upload happens whether the search requires the event types and tags or not. The search processing attempts to apply the event types and tags to each event.
For example, consider the following search:
index=_internal | search tag=foo | stats count by host
This search needs only the tag foo
. But without the optimization, all of the event types and tags are uploaded.
The built-in optimization solves this problem by analyzing the search and only uploading the event types and tags that are required. If no event types or tags are needed, none are uploaded which improves search performance.
For transforming searches, this optimization is controlled by several settings in the limits.conf
file, under the [search_optimization::required_field_values]
stanza. These settings are turned on by default. There is no need to change these settings, unless you need to troubleshoot an issue.
Data model acceleration searches
Another problem that can be solved by the event type and tags optimization is specifically targeted at data model acceleration. With a configuration setting in the datamodels.conf
file, you can specify a list of tags that you want to use with the data model acceleration searches. You specify the list under the [dm.name]
stanza with the tags_whitelist
setting.
For detailed information about the tags_whitelist
setting, including usage examples, see Set a tag whitelist for better data model performance in the Knowledge Manager Manual.
Analyze search optimizations
You can analyze the impact of the built-in optimizations by using the Job Inspector.
Determine search processing time
You can use the Job Inspector to determine whether the built-in search optimizations are helping your search to complete faster.
- Run your search.
- From the search action buttons, select Job > Inspect job.
- In the Job Inspector window, review the message at the top of the window. The message is similar to "The search has completed and returned X results by scanning X events in X seconds." Make note of the number of seconds to complete the search.
- Close the Job inspector window.
- In the Search bar, add
|noop search_optimization=false
to the end of your search. This turns off the built-in optimizations for this search. - Run the search.
- Inspect the job and compare the message at the top of the Job Inspector window with the previous message. This message specifies how many seconds the search processing took without the built-in optimizations.
View the optimized search
You can compare your original search with the optimized search that is created when the built-in optimizations are turned on.
- Run your search.
- From the search action buttons, select Job, Inspect job.
- In the Job Inspector window, expand the Search job properties section and scroll to the normalizedSearch property. This property shows the internal search that is created based on your original search.
- Scroll to the optimizedSearch property. This property shows the search that is created, based on the normalizedSearch, when the built-in optimizations are applied.
Optimization settings
By default, the built-in optimizations are turned on.
Turn off optimization for a specific search
In some very limited situations, the optimization that is built into the search processor might not optimize a search correctly. In these situations, you can troubleshoot the problem by turning off the search optimization for that specific search.
Turning off the search optimization enables you to determine if the cause of unexpected or limited search results is the search optimization.
You turn off the built-in optimizations for a specific search by using the noop
command.
You add noop search_optimization=false
at the end of your search. For example:
| datamodel Authentication Successful_Authentication search | where Authentication.user = "fred" | noop search_optimization=false
Turn off all optimizations
You can turn off all of the built-in optimizations for all users.
- Splunk Cloud Platform
- To turn off the built-in optimizations for all users, request help from Splunk Support. If you have a support contract, file a new case using the Splunk Support Portal at Support and Services. Otherwise, contact Splunk Customer Support.
- Splunk Enterprise
- To turn off the built-in optimizations for all users, follow these steps.
- Prerequisites
- Only users with file system access, such as system administrators, can turn off the built-in optimizations using a configuration file.
- Review the steps in How to edit a configuration file in the Splunk Enterprise Admin Manual.
- You can have configuration files with the same name in your default, local, and app directories. Read Where you can place (or find) your modified configuration files in the Splunk Enterprise Admin Manual.
Never change or copy the configuration files in the default directory. The files in the default directory must remain intact and in their original location.
- Steps
- Open or create a local limits.conf file for the Search app at
$SPLUNK_HOME/etc/apps/search/local
. - Under the
[search_optimization]
stanza, set enabled=false.
See also
In the Search Reference:
- The
noop
command
Write better searches | Search normalization |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408, 9.0.2205, 9.0.2208, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)
Feedback submitted, thanks!