Use summary indexing for increased search efficiency

Summary indexes enable you to efficiently search on large volumes of data. When you create a summary index you design a scheduled search that runs in the background, extracting a precise set of statistical information from a large and varied dataset. The results of each run of the search are stored in a summary index that you designate. Searches you run against the completed summary index should complete much faster than similar searches run against the source dataset.

The summary index is "faster" because it is smaller than the original dataset and contains only data that is relevant to the search that you run against it. The summary index is also guaranteed to be statistically accurate, in part because the scheduled search that updates the summary runs on an interval that is shorter than the average time range of the searches that you run against the summary index. For example, if you want to run ad-hoc searches over the summary index that cover the past seven days, you should build and update the summary index with a search that runs hourly.

Summary indexing allows the cost of a computationally expensive report to be spread over time. For example, the hourly search to update a summary index with the previous hour's worth of data should take a fraction of a minute. Running the weekly report against the original dataset would take approximately 168 (7 days * 24 hours/day) times longer.

Types of summary indexes

You can create two types of summary indexes:

summary events indexes
summary metrics indexes

Both types of summary indexes are built and updated with the results of transforming searches over event data. The difference is that summary events indexes store the statistical event data as events, while summary metrics indexes convert that statistical event data into metric data points as part of their summarization process.

Metrics indexes store metric data points in a way that makes searches against them notably fast, and which reduces the space they take up on disk, compared to events indexes. You may find that a summary metrics index provides faster search performance than a summary events index, even when both indexes are summarizing data from the same source dataset. Your choice of summary index type might be determined by your comfort with working with metrics data. Metric data points might be inappropriate for the data analysis you want to perform.

Get started with summary indexing

Use the following topics to create both types of summary indexes.

Topic title	Helps with	Description
Create a summary index in Splunk Web	Summary events indexes Summary metrics indexes	Create summary events indexes and summary metrics indexes through Splunk Web. Design a report that can populate a summary index, schedule it, and enable it for summary indexing.
Design searches that populate summary events indexes	Summary events indexes only	Searches that populate summary events indexes require special transforming commands such as `sistats`, `sichart`, and `sitimechart`. Find out why you should use these commands. Design searches that populate summary events indexes with data in a manner that ensures that searches of those summary indexes return statistically accurate results.
Configure summary indexes	Summary events indexes only	Design summary events indexes manually through configuration files. Create summary-index-populating searches that forego the `si*` commands in favor of `collect` and `addinfo`.

For more information about metrics, see Overview of metrics in Metrics.

Summary indexing use cases

The following sections describe some summary indexing use case examples.

Run reports over long time ranges for large datasets more efficiently

Your instance of the Splunk platform indexes tens of millions of events per day. You want to set up a dashboard with a panel that displays the number of page views and visitors each of your Web sites had over the past 30 days, broken out by site.

You could run this report on your primary data volume, but its runtime would be quite long, because the Splunk software has to sort through a huge number of events that are totally unrelated to web traffic in order to extract the desired data. Additionally, the fact that the report is included in a popular dashboard means it will be run frequently. This run frequency could significantly extend its average runtime, leading to a lot of frustrated users.

To deal with this, you set up a saved search that collects website page view and visitor information into a designated summary index on a weekly, daily, or even hourly basis. You'll then run your month-end report on this smaller summary index, and the report should complete far faster than it would otherwise because it is searching on a smaller and better-focused dataset.

Building rolling reports

Say you want to run a report that shows a running count of an aggregated statistic over a long period of time--a running count of downloads of a file from a Web site you manage, for example.

First, schedule a saved search to return the total number of downloads over a specified slice of time. Then, use summary indexing to save the results of that search into a summary index. You can then run a report any time you want on the data in the summary index to obtain the latest count of the total number of downloads.

Does summary indexing count against your license?

The question of whether summary indexing counts against your license depends on the type of license your Splunk platform deployment uses.

If you have a Splunk Enterprise deployment

If your Splunk Enterprise deployment uses ingest pricing, summary indexing is not counted against your license, even if you have multiple summary indexes.

If your Splunk Enterprise deployment uses workload pricing, all summary indexing workloads are counted against the amount of compute capacity you have purchased for your Splunk Enterprise deployment, as measured with virtual central processing (vCPU) units.

For more information, see Types of Splunk Enterprise licenses, in the Splunk Enterprise Admin Manual.

If you have a Splunk Cloud Platform subscription

If your Splunk Cloud Platform subscription is workload-based, all summary indexing is counted against the usage entitlement you have purchased for your deployment, as measured with Splunk Virtual Compute (SVC) units.

If your Splunk Cloud Platform subscription is ingest-based, summary indexing is not counted against your license, even if you have multiple summary indexes. However, ingest-based subscriptions typically include 90 days of retention, and summary indexing does consume that storage entitlement. If you have an ingest-based subscription, summary indexing might limit your ability to achieve 90-day retention of your data.

For more information, see the following parts of the Splunk Cloud Platform Service description:

When you use ingest pricing and you change the default source type of your summary index events

All summarized data has a special default source type. Events summarized in a summary events index have a source type of stash. Metric data points summarized in a summary metrics index have a source type of mcollect_stash.

If your Splunk platform deployment uses ingest pricing, and you use commands like collect or mcollect to change these source types to anything other than stash (for events) or mcollect_stash (for metric data points), you will incur license usage charges for those events or metric data points.

How event summary indexing works

When a scheduled search that has been enabled for summary event indexing runs on its schedule, Splunk software temporarily stores its search results in a file as follows:

$SPLUNK_HOME/var/spool/splunk/<MD5_hash_of_savedsearch_name>_<random-number>.stash_new

MD5 hashes of search names are used to cover situations where the search name is overlong.

From the file, Splunk software uses the addinfo command to add general information about the current search and the fields you specify during configuration to each result. Splunk Enterprise then indexes the resulting event data in the summary index that you've designated for it (index=summary by default).

Use the addinfo command to add fields containing general information about the current search to the search results going into a summary index. General information added about the search helps you run reports on results you place in a summary index.

Choose to summarize multivalue fields as separate field-value pairs

When a scheduled search that you have enabled for summary event indexing runs on its schedule, the Splunk software runs the collect command in the background to add the results of the search to the specified summary index. By default, collect adds multivalue fields to summary indexes as intact multivalue fields. However, you can optionally have collect break multivalue fields up into individual field-value pairs when it adds them to the summary index, by setting format_multivalue_collect to true in limits.conf.

For more information, see the Usage section of the collect command reference topic in the Search Reference.

Related answers from Splunk Community

Use summary indexing for increased search efficiency

Types of summary indexes

Get started with summary indexing

Summary indexing use cases

Run reports over long time ranges for large datasets more efficiently

Building rolling reports

Does summary indexing count against your license?

If you have a Splunk Enterprise deployment

If you have a Splunk Cloud Platform subscription

When you use ingest pricing and you change the default source type of your summary index events

How event summary indexing works

Choose to summarize multivalue fields as separate field-value pairs

Comments

Use summary indexing for increased search efficiency

Was this topic useful?