User Manual

 


Report

Increase reporting efficiency with summary indexing

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.

Increase reporting efficiency with summary indexing

Running a report on a large dataset for a long timespan can be quite time consuming. If you only have to do this on an occasional basis it might not be a big problem for you. But running such reports on a regular schedule would be impractical--and this impracticality only increases exponentially as more and more users in your organization use Splunk to run similar reports.

Use summary indexing to efficiently report on large volumes of data. With summary indexing, you define a saved search that extracts the precise information that you want to report on. You schedule this search to run periodically over a time interval that is appropriate for your needs--it could be daily, or it could be every ten minutes. Each time Splunk runs the report, it saves the results since the last time the report was run into a summary index that you designate. You can then search and run reports on this smaller summary index instead of working with the much larger dataset.

You can use summary indexing to:

For example you may want to run a report at the end of each month that displays the number of page views and visitors each of your Web sites had, broken out by site. If you were to just run this report on your primary data volume, it would take a long time to run because Splunk has to sort through a great deal of events that have nothing to do with web traffic in order to extract the desired information. When you use summary indexing, the saved search collects the page view and visitor information into a designated summary index for you on a weekly, daily, or even hourly basis. When you run your "month-end" report, the report completes much faster because it is searching through a much smaller and better focused set of data.

Or, you may want to run a report that shows a running count of a statistic over a long period of time. For example, you may want a running count of downloads of a file from a Web site you manage. Schedule a saved search to return the total number of downloads over a specified slice of time. Use summary indexing to have Splunk save the results into a summary index. You can then run a report any time you want on the data in the summary index to obtain the latest count of the total number of downloads.

For another view into the ideas behind summary indexing, you can watch this Splunk developer video about the theory and practice of summary indexing.

Note: Indexing events in a summary index counts against your license volume. We recommend that you not index more events in your summary indexes than you really need. Consult Splunk support for specific information on license volume impact.


How summary indexing works

In Splunk Web, summary indexing is an alert option for scheduled saved searches. When you run a saved search with summary indexing turned on, its search results are temporarily stored in a file ($SPLUNK_HOME/var/spool/splunk/<savedsearch_name>_<random-number>.stash). From the file, Splunk adds general information about the current search and the fields you specify during configuration (using the addinfo command) to every result and indexes the results as events in a summary index (index=summary by default).

Note: Use the addinfo command to add fields containing general information about the current search to the search results going into a summary index. General information added about the search helps you run reports on results you place in a summary index.

After Splunk indexes results in the summary index, search and report on them by specifying the name of the summary index in your search.

Example:

This search focuses on the "summary" index and returns events from the most common referrers in that index.

* index=summary | top referrer


Configure summary indexing

Configure summary indexing as an alert option for a scheduled saved search via Splunk Web. Once you configure summary indexing for a saved search, you can further configure it via savedsearches.conf.

Note: You must enable summary indexing via Splunk Web before you can configure it in savedsearches.conf, unless you manually configure summary indexing).


Search commands useful to summary indexing

Summary indexing uses some new search commands behind the scenes to perform its actions.

Another useful command is overlap. You can use overlap to find gaps in events or overlapping events in a summary index.


General guidelines for summary indexing

Note: Currently, indexing events in a summary index counts against your license volume. We recommend that you not index more events in your summary indexes than you really need. Consult Splunk support for specific information on license volume impact.

Use summary indexing to:

When using summary indexing:


Aggregated statistics

Be careful when building reports made of aggregated statistics. Some aggregating statistical functions (such as distinct count, mode, median, etc.) yield incorrect results when you use them on aggregated statistics. Use one of Splunk's reporting commands to access statistical functions.

For example, if you want to build hourly/daily/weekly reports of average response times, generate the "daily average" by averaging the "hourly averages" together. The daily average becomes skewed if there aren't the same number of events in each "hourly average". Get the correct "daily average" by using a weighted average function.

Example:

The following expression calculates the the daily average response time correctly (a weighted average) using stats and eval.

| stats sum(hourly_resp_time_sum) as resp_time_sum, sum(hourly_resp_time_count) as resp_time_count | eval daily_average= resp_time_sum/resp_time_count | .....


Gaps and overlaps

Gaps

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can occur if:

Overlaps

Overlaps are events in a summary index (from the same search) that share the same timestamp. Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if you set the time range of a saved search to be longer than the frequency of the schedule of the search, or you run summary indexing manually (using | collect).

Identify gaps and overlaps in data

Identify overlaps and gaps in a summary index using the "Summary Index Gaps and Overlaps" form search (a default saved search in the main Splunk dashboard), or by using the overlap command in your search (add | overlap at the end of the search that produces overlaps).

If you run the form search Summary Index Gaps and Overlaps, specify the time range using the form, or switch to a "text" display where you must specify the following parameters in the search bar (following | overlap):

either specify:

or:

If you identify a gap, you can run your scheduled saved search over the period of the gap and summary index the results (using | collect). If you identify overlapping events, you can manually delete the overlaps from the summary index by using the search language.

This documentation applies to the following versions of Splunk: 3.3 , 3.3.1 , 3.3.2 , 3.3.3 , 3.3.4 , 3.4 , 3.4.1 , 3.4.2 , 3.4.3 , 3.4.5 , 3.4.6 , 3.4.8 , 3.4.9 , 3.4.10 , 3.4.11 , 3.4.12 , 3.4.13 , 3.4.14 View the Article History for its revisions.


You must be logged into splunk.com in order to post comments. Log in now.

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole. Feedback you enter here will be delivered to the documentation team.

Feedback submitted, thanks!