Use summary indexing for increased search efficiency
Summary indexes enable you to efficiently search on large volumes of data. When you create a summary index you design a scheduled search that runs in the background, extracting a precise set of statistical information from a large and varied dataset. The results of each run of the search are stored in a summary index that you designate. Searches you run against the completed summary index should complete much faster than similar searches run against the source dataset.
The summary index is "faster" because it is smaller than the original dataset and contains only data that is relevant to the search that you run against it. The summary index is also guaranteed to be statistically accurate, in part because the scheduled search that updates the summary runs on an interval that is shorter than the average time range of the searches that you run against the summary index. For example, if you want to run ad-hoc searches over the summary index that cover the past seven days, you should build and update the summary index with a search that runs hourly.
Summary indexing allows the cost of a computationally expensive report to be spread over time. For example, the hourly search to update a summary index with the previous hour's worth of data should take a fraction of a minute. Running the weekly report against the original dataset would take approximately 168 (7 days * 24 hours/day) times longer.
Types of summary indexes
You can create two types of summary indexes:
- summary events indexes
- summary metrics indexes.
Both types of summary indexes are built and updated with the results of transforming searches over event data. The difference is that summary events indexes store the statistical event data as events, while summary metrics indexes convert that statistical event data into metric data points as part of their summarization process.
Metrics indexes store metric data points in a way that makes searches against them notably fast, and which reduces the space they take up on disk, compared to events indexes. You may find that a summary metrics index provides faster search performance than a summary events index, even when both indexes are summarizing data from the same source dataset. Your choice of summary index type might be determined by your comfort with working with metrics data. Metric data points might be inappropriate for the data analysis you want to perform.
To learn how to create both types of summary indexes, see Create a summary index in Splunk Web.
For more information about metrics, see Overview of metrics in Metrics.
Summary indexing use cases
The following sections describe some summary indexing use case examples.
Run reports over long time ranges for large datasets more efficiently
Your instance of the Splunk platform indexes tens of millions of events per day. You want to set up a dashboard with a panel that displays the number of page views and visitors each of your Web sites had over the past 30 days, broken out by site.
You could run this report on your primary data volume, but its runtime would be quite long, because the Splunk software has to sort through a huge number of events that are totally unrelated to web traffic in order to extract the desired data. Additionally, the fact that the report is included in a popular dashboard means it will be run frequently. This run frequency could significantly extend its average runtime, leading to a lot of frustrated users.
To deal with this, you set up a saved search that collects website page view and visitor information into a designated summary index on a weekly, daily, or even hourly basis. You'll then run your month-end report on this smaller summary index, and the report should complete far faster than it would otherwise because it is searching on a smaller and better-focused dataset.
Building rolling reports
Say you want to run a report that shows a running count of an aggregated statistic over a long period of time--a running count of downloads of a file from a Web site you manage, for example.
First, schedule a saved search to return the total number of downloads over a specified slice of time. Then, use summary indexing to save the results of that search into a summary index. You can then run a report any time you want on the data in the summary index to obtain the latest count of the total number of downloads.
Does summary indexing count against your license?
Summary indexing data volume is not counted against your license, even if you have multiple summary indexes.
All summarized data has a special default source type. Events summarized in a summary events index have a source type of
stash. Metric data points summarized in a summary metrics index have a source type of
If you use commands like
mcollect to change these source types to anything other than
stash (for events) or
mcollect_stash (for metric data points), you will incur license usage charges for those events or metric data points.
How event summary indexing works
When a scheduled search that has been enabled for summary event indexing runs on its schedule, Splunk software temporarily stores its search results in a file as follows:
MD5 hashes of search names are used to cover situations where the search name is overlong.
From the file, Splunk software uses the
addinfo command to add general information about the current search and the fields you specify during configuration to each result. Splunk Enterprise then indexes the resulting event data in the summary index that you've designated for it (
index=summary by default).
addinfo command to add fields containing general information about the current search to the search results going into a summary index. General information added about the search helps you run reports on results you place in a summary index.
Choose to summarize multivalue fields as separate field-value pairs
When a scheduled search that you have enabled for summary event indexing runs on its schedule, the Splunk software runs the
collect command in the background to add the results of the search to the specified summary index. By default,
collect adds multivalue fields to summary indexes as intact multivalue fields. However, you can optionally have
collect break multivalue fields up into individual field-value pairs when it adds them to the summary index, by setting
For more information, see the Usage section of the
collect command reference topic in the Search Reference.
Share data model acceleration summaries among search heads
Create a summary index in Splunk Web
This documentation applies to the following versions of Splunk Cloud Platform™: 8.2.2201 (latest FedRAMP release), 8.2.2202, 8.2.2203