Splunk® Enterprise

Search Manual

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

About optimization

When not optimized, a search often runs longer, retrieves larger amounts of data from the indexes than is needed, and inefficiently uses more memory and network resources. Multiply these issues by hundreds or thousands of searches and the end result is a slow or sluggish system.

Search optimization is a technique for making your search run as efficiently as possible. The principles and techniques to implement the principles are described in the following table.

Search optimization principles How to implement the principles
  • Retrieve only the required data
  • Move as little data as possible
  • Parallelize as much work as possible
  • Set appropriate time windows
  • Filter as much as possible in the initial search
  • Perform joins and lookups on only the required data
  • Perform evaluations on the minimum number of events possible
  • Move commands that bring data to the search head as late as possible in your search criteria

Indexes and searches

When you run a search, the software uses the information in the index files to identify which events to retrieve from disk. The smaller the number of events to retrieve from disk, the faster the search runs.

How you construct your search has a significant impact on the number of events retrieved from disk.

When data is indexed, the data is processed into events based on time. The processed data consists of several files:

  • The raw data in compressed form (rawdata)
  • The indexes that point to the raw data (index files, also referred to as tsidx files)
  • Some metadata files


These files are written to disk and reside in sets of directories, organized by age, called buckets.

Use indexes effectively

One method to limit the data that is pulled off from disk is to partition data into separate indexes. If you rarely search across more than one type of data at a time, partition your different types of data into separate indexes. Then restrict your searches to the specific index. For example, store Web access data in one index and firewall data in another. This is recommended for sparse data, which might otherwise be buried in a large volume of unrelated data.

A tale of two searches

The following set of images illustrate how optimizing just one of your searches can save system resources.

A common search

One common search contains a lookup and an evaluation, followed by another search. For example:

sourcetype=my_source | lookup my_lookup_file L OUTPUTNEW L | eval E=L/T | search A=25 L>100 E>50


The following diagram shows a simplified, visual representation of this search. This image shows a flow chart diagram of the search.  The first node is "search sourcetype". The second node is "lookup L". The third node is "eval E". The last node is "search A=25 L>100 E>50".

In the following image the search accesses the index and, based on the source type, extracts 1 million events.

This image shows the first part of the search with the criteria "search sourcetype". A sample set of events is displayed with columns A, B, C, and D.  There is a part of the image that tracks Total Cost. The  Total Cost for this search shows that 1 million events were extracted.

In the next part of the search, the lookup and eval are performed on all 1 million events. Both the lookup and eval commands add columns to the events, as shown in the following image.

This image shows the lookup and eval parts of the search. The lookup added column L to the results. The eval adds column E to the results. The Total Cost for this search is that both the lookup and eval are run against the 1 million extracted events.

Finally, a second search command is run against the A, L, and E columns.

  • For the A column, the search is looking for values that are equal to 25.
  • For the L column, which was added by the lookup command, the search is looking for values greater than 100.
  • For the E column, which was added by the eval command, the search is looking for values that are greater than 50.


Events that match the criteria for A, L, and E are identified and 50 thousand events that match the search criteria are returned. The following image shows the entire process and the resource costs involved in this inefficient search.

This image shows the final part of the search with the criteria "search A=25 L>100 E>50". This criteria means to return events where field A is equal to 10, AND field L is greater than 100, AND field E is greater than 50. This criteria runs against all 1 million results and filters the results down to 50 thousand events.

An optimized search

You can optimize the entire search by moving some of the components from the second search to locations earlier in the search process. The following image shows:

  • Moving the A criteria before the first pipe reduces the amount of times that the index is accessed. The number of events extracted is reduced by 700,000.
  • The lookup is performed on 300,000 events instead of 1 million events.
  • Moving the L criteria immediately after the lookup reduces the number of events that are part of the events returned by 100,000.
  • The eval is performed on 200,000 events instead of 1 million events.
  • The E criteria is dependent on the results of the eval command. The results are reduced down to 50,000.


This image shows the revised search criteria as described above. The first node is "search sourcetype & A". The second node is "lookup L". The third node is "search L". The forth node is "eval E". The final node is "search E".

See also

PREVIOUS
SPL and regular expressions
  NEXT
Quick tips for optimization

This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters