Splunk® Enterprise

Search Manual

Acrobat logo Download manual as PDF

Splunk Enterprise version 6.x is no longer supported as of October 23, 2019. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Acrobat logo Download topic as PDF

Write better searches

This topic discusses some causes of slow searches and suggests simple rules of thumb to help you write searches that will run more efficiently. Many factors can affect the speed of your searches: the volume of data that you are searching, how your searches are constructed, can your deployment handle the number of users running searches at the same time, and so on. The key to optimizing your search speed is to make sure that Splunk software does not do more work than necessary.

Know your type of search

The recommendations for optimizing searches vary depending on the type of search that you run and the characteristics of the data you are searching. In general, think of what you are trying to accomplish: retrieve events or generate reports. If the events you want to retrieve occur frequently in the dataset, the search is called a dense search. If the events you want to retrieve are rare in the dataset, the search is called a sparse search.

Read more about "About search types".

Raw event searches

Raw event searches return events from a Splunk index without any additional processing to the events that are retrieved. The best rule of thumb to follow when retrieving events from the index is to be specific about the events that you want to retrieve. You can do this with keywords and field/value pairs that are unique to the events. One thing to keep in mind is that sparse searches against large volumes of data will take longer than dense searches against the same data set.

Report-generating searches

Report-generating searches perform additional processing on events after they've been retrieved from an index. This processing can include filtering, transforming, and other operations using one or more statistical functions against the set of results. Because this processing occurs in memory, the more restrictive and specific you are when specifying the events to retrieve from disk, the faster the search will be.

Command types and parallel processing

Some commands process events in a stream. One event in and one, or no, event out. These are referred to as streaming commands. Examples of streaming commands are: where, eval, lookup, and search.

Other commands require all of the events from all of the indexers before the command can finish. These are referred to as non-streaming commands. Examples of non-streaming commands are: stats, sort, dedup, top, and append.

Non-streaming commands can run only when all of the data is available. To process non-streaming commands, all of the search results from the indexers are sent to the search head. When this happens, all further processing must be performed by the search head, rather than in parallel on the indexers.

Parallel processing example

Non-streaming commands that are early in your search reduce parallel processing.

For example, the following image shows a search that a user has run. The search starts with the search command, which is implied as the first command in the Search bar. The search continues with the lookup, where, and eval commands. The search then contains a sort, based on the Name field, followed by another where command.

The search is sent to the search head and distributed to the indexers to process as much of the search as possible on the indexers.

This image shows

For the events that are on each indexer, the indexer process the search until the indexer encounters a non-streaming command. In this example, the indexers process the search through the eval command. To perform the sort, all of the results must be send to the search head for processing.

However, the results on each indexer can be sorted by the indexer. This is referred to as a presort. In this example the sort is on the Name column. The following image shows that the first indexer returns the names Alex and Maria. The second indexer returns the name Wei. The third indexer returns the names Claudia, David, and Eduardo.

This image shows

To return the full list of results sorted by name, all of the events that match the search criteria must first be sent to the search head. When all of the results are on the search head, the rest of the search must be processed on the search head. In this example the sort and any remaining commands are processed on the search head.

The following image shows that each indexer has presorted the results, based on the Name column. The results are sent to the search head, and are initially appended to each other. The search head then sorts the entire list into the correct order. The search head processes the remaining commands in the search to produce the final results. In this example, that includes the second where command. The final results are returned to the user.

This image shows

When part or all of a search is run on the indexers, the search processes in parallel and search performance is much quicker.

To optimize your searches, place non-streaming commands as late as possible in your search string.

Tips for tuning your searches

In most cases, your search is slow because of the complexity of your query to retrieve events from index. For example, if your search contains extremely large OR lists, complex subsearches (which break down into OR lists), and types of phrase searches, it will take longer to process. This section discusses some tips for tuning your searches so that they are more efficient.

Performing statistics with a BY clause on a set of fields that are very uncommon or unique, high cardinality, requires a lot of memory. One possible remedy is to decrease the value for the chunk_size setting used with the tstats command. Additionally, reducing the number of distinct values that the BY clause must process can also be beneficial.

Restrict searches to the specific index

If you rarely search across more than one type of data at a time, partition your different types of data into separate indexes. Then restrict your searches to the specific index. For example, store Web access data in one index and firewall data in another. This is recommended for sparse data, which might otherwise be buried in a large volume of unrelated data. Read more about ways to set up multiple indexes and how to search different indexes.

Use fields effectively

Searches with fields are faster when they use fields that have already been extracted (indexed fields) instead of fields extracted at search time. For more information about indexed fields and default fields, see About fields in the Knowledge Manager Manual.

Use indexed and default fields

Use indexed and default fields whenever you can to help search or filter your data efficiently. At index time, Splunk software extracts a set of default fields that are common to each event; these fields include host, source, and sourcetype. Use these fields to filter your data as early as possible in the search so that processing is done on a minimum amount of data.

For example, if you're building a report on web access errors, search for those specific errors before the reporting command:

sourcetype=access_* (status=4* OR status=5*) | stats count by status

Specify indexed fields with <field>::<value>

You can also run efficient searches for fields that have been indexed from structured data such as CSV files and JSON data sources. When you do this, replace the equal sign with double colons, like this: <field>::<value>.

This syntax works best in searches for fields that have been indexed from structured data, though it can be used to search for default and custom indexed fields as well. You cannot use it to search on Search-time fields.

Disable field discovery to improve search performance

If you don't need additional fields in your search, set Search Mode to a setting that disables field discovery to improve search performance in the timeline view or use the fields command to specify only the fields that you want to see in your results.

6.2 searchmode options.png

The tradeoff to disabling field discovery is that doing so prevents automatic field extraction, except for fields that are required to fulfill your search (such as fields that you are specifically searching on) and default fields such as _time, host, source, and sourcetype. The search runs faster because Splunk software is no longer trying to extract every field possible from your events.

Search mode is set to Smart by default. Set it to Verbose if you are running searches with reporting commands, don't know what fields exist in your data, and think you might need them to help you narrow down your search in some way.

See "Set search mode to adjust your search experience," in this manual.

Also see the topic fields command in the Search Reference.

Summarize your data

It can take a lot of time to search through very large data sets. If you regularly generate reports on large volumes of data, use summary indexing to pre-calculate the values that you use most often in your reports. Schedule saved searches to collect metrics on a regular basis, and report on the summarized data instead of on raw data.

Read more about how to use summary indexing for increased reporting efficiency.

Use the Search Job Inspector

The Search Job Inspector is a tool you can use both to troubleshoot the performance of a search and to determine which phase of the search takes the greatest amounts of time. It dissects the behavior of your searches to help you understand the execution costs of knowledge objects such as event types, tags, lookups, search commands, and other components within the search.

See View search job properties in this manual.

See also

Last modified on 15 May, 2017
Quick tips for optimization
About retrieving events

This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11

Was this documentation topic helpful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters