Splunk® Enterprise

Knowledge Manager Manual

Download manual as PDF

Use summary indexing for increased reporting efficiency

Use summary indexing to efficiently report on large volumes of data. With summary indexing, you set up a frequently-running search that extracts the precise information you want. Each time this search is run, its results are saved into a summary index that you designate. You can then run searches and reports on this significantly smaller (and thus seemingly "faster") summary index. And what's more, these reports will be statistically accurate because of the frequency of the index-populating search (for example, if you want to manually run searches that cover the past seven days, you might run them on a summary index that is updated on an hourly basis).

Summary indexing allows the cost of a computationally expensive report to be spread over time. In the example we've been discussing, the hourly search to populate the summary index with the previous hour's worth of data would take a fraction of a minute. Generating the complete report without the benefit of summary indexing would take approximately 168 (7 days * 24 hrs/day) times longer.

Perhaps an even more important advantage of summary indexing is its ability to amortize costs over different reports, as well as for the same report over a different but overlapping time range. The same summary data generated on a Tuesday can be used for a report of the previous 7 days done on the Wednesday, Thursday, or the following Monday. It could also be used for a monthly report that needed the average response size per day.

Note: Summary indexing volume is not counted against your license, even if you have multiple summary indexes. However, in the event of a license violation, summary indexing will halt like any other non-internal search behavior.

Summary indexing use cases

Example #1 - Run reports over long time ranges for large datasets more efficiently: You're using Splunk Enterprise at a company that indexes tens of millions of events--or more--per day. You want to set up a dashboard for your employees that, among other things, displays a report that shows the number of page views and visitors each of your Web sites had over the past 30 days, broken out by site.

You could run this report on your primary data volume, but its runtime would be quite long, because Splunk software has to sort through a huge number of events that are totally unrelated to web traffic in order to extract the desired data. But that's not all--the fact that the report is included in a popular dashboard means it'll be run frequently, and this could significantly extend its average runtime, leading to a lot of frustrated users.

But if you use summary indexing, you can set up a saved search that collects website page view and visitor information into a designated summary index on a weekly, daily, or even hourly basis. You'll then run your month-end report on this smaller summary index, and the report should complete far faster than it would otherwise because it is searching on a smaller and better-focused dataset.

Example #2 - Building rolling reports: Say you want to run a report that shows a running count of an aggregated statistic over a long period of time--a running count of downloads of a file from a Web site you manage, for example.

First, schedule a saved search to return the total number of downloads over a specified slice of time. Then, use summary indexing to save the results of that search into a summary index. You can then run a report any time you want on the data in the summary index to obtain the latest count of the total number of downloads.

For another view, you can watch this Splunk developer video about the theory and practice of summary indexing.

Use the summary indexing reporting commands

If you are new to summary indexing, use the summary indexing reporting commands (sichart, sitimechart, sistats, sitop, and sirare) when you define the search that will populate the summary index. If you use these commands you can use the same search string that you use for the search that you eventually run on the summary index, with the exception that you use regular reporting commands in the latter search.

Note: You do not have to use the si- summary index search commands if you are proficient with the "old-school" way of creating summary-index-populating searches. If you create summary indexes using those methods and they work for you there's no need to update them. In fact, they may be more efficient: there are performance impacts related to the use of the si- commands, because they create slightly larger indexes than the "manual" method does.

In most cases the impact is insignificant, but you may notice a difference if the summary indexes you are creating are themselves fairly large. You may also notice performance issues if you're setting up several searches to report against an index populated by an si- command search.

See the following section if you're interested in designing summary indexes without the help of si- search commands.

Define index-populating searches without the special commands

In previous versions of Splunk Enterprise you had to be very careful about how you designed the searches that you used to populate your summary index, especially if the search you wanted to run on the finished summary index involved aggregate statistics, because it meant that you had to carefully set up the "index-populating" search in a way that did not provide incorrect results. For example, if you wanted to run a search on the finished summary index that gave you average response times broken out by server, you'd want to set up a summary-index-populating search that:

  • is scheduled to run on a more frequent basis than the search you plan to run against the summary index
  • samples a larger amount of data than the search you plan to run against the summary index.
  • contains additional search commands that ensure that the index-populating search is generating a weighted average (only necessary if you are looking for an average in the first place)..

The summary index reporting commands take care of the last two points for you--they automatically determine the adjustments that need to be made so that your summary index is populated with data that does not produce statistically inaccurate results. However, you still should arrange for the summary-index-populating search to run on a more frequent basis than the search that you later run against the summary index.

Interested in setting up summary indexes without the si- commands? Find out about the addinfo, collect, and overlap commands, learn how to devise searches that provide weighted averages, and review an example of summary index configuration via savedsearches.conf in the topic "Configure summary indexes," in this manual.

Summary indexing reporting command usage example

Let's say you've been running the following search, with a time range of the past year:

eventtype=firewall | top src_ip

This search gives you the top source ips for the past year, but it takes forever to run because it scans across your entire index each time.

What you need to do is create a summary index that is composed of the top source IPs from the "firewall" event type. You can use the following search to build that summary index. You would schedule it to run on a daily basis, collecting the top src_ip values for only the previous 24 hours each time. The results of each daily search are added to an index named "summary":

eventtype=firewall | sitop src_ip

Note: Summary-index-populating searches are statistically more accurate if you schedule them to run and sample information on a more frequent basis than the searches you plan to run against the finished summary index. So in this example, because we plan to run searches that cover a timespan of a year, we set up a summary-index-populating search that samples information on a daily basis.

Important: When you define summary-index-populating searches, do not pipe other search operators after the main summary indexing reporting command. In other words, don't include additional | eval commands and the like. Save the extra search operators for the searches you run against the summary indexes, not the search you use to populate it.

Important: The results from a summary-indexing optimized search are stored in a special format that cannot be modified before the final transformation is performed. This means that if you populate a summary index with ... | sistats <args>, the only valid retrieval of the data is: index=<summary> source=<saved search name> | stats <args>. The search against the summary index cannot create or modify fields before the | stats <args> command.

Now, let's say you save this search with the name "Summary - firewall top src_ip" (all saved summary-index-populating searches should have names that identify them as such). After your summary index is populated with results, search and report against that summary index using a search that specifies the summary index and the name of the search that you used to populate it. For example, this is the search you would use to get the top source_ips over the past year:

index=summary search_name="summary - firewall top src_ip" |top src_ip

Because this search specifies the search name, it filters out other data that have been placed in the summary index by other summary indexing searches. This search should run fairly quickly, even if the time range is a year or more.

Note: If you are running a search against a summary index that queries for events with a specific sourcetype value, be aware that you need to use orig_sourcetype instead. So instead of running a search against a summary index like ...|stats timechart avg(ip) by sourcetype, use ...|stats timechart avg(ip) by orig_sourcetype.

Why do you have to do this? When events are gathered into a summary index, their sourcetype values are changed to "stash" and moves the original sourcetype values to orig_sourcetype.

Set up summary index searches in Splunk Web

You can set up summary index searches through Splunk Web. Summary indexing is an alert option for scheduled reports. Once you determine the report that you want to use to populate a summary index, follow these steps:

1. Navigate to Settings > Searches, Reports, and Alerts.

2. Select the name of a report (or click New to create a new report).

3. Under Schedule and alert, select Schedule this search if the report isn't already scheduled.

You must select Schedule this search to see the report scheduling options.

4. Schedule the report to run on an appropriate interval.

Searches that populate summary indexes should run on a fairly frequent basis in order to create statistically accurate final reports. If the report you're running against the summary index is gathering information for the past week, you should have the summary report run on an hourly basis, collecting information for each hour. If you're running reports over the past year's worth of data, you might have the summary index collect data on a daily basis for the past day. For more information, see "Schedule reports" in the Reporting Manual.
Note: Be sure to schedule the report so that there are no data gaps and overlaps. For more on this see the subtopic on this issue, below.

5. Under Alert, set Condition to always.

6. Set Alert mode to Once per search.

This ensures that the alert will be triggered each time the report runs.

5.0-enable sum indexing.jpg

7. Under Summary indexing, select Enable.

When you select Enable, the alert Condition is set to always and the Alert mode to Once per search. You won't be able to select other values for these fields without disabling summary indexing.

8. Select the name of the summary index that the report populates from the Select the summary index list.

The default summary index is named summary. The list only displays indexes to which you have permission to write.

9. (Optional) You may need to create additional summary indexes if you plan to run a variety of summary index reports.

For information about creating new indexes, see "Set up multiple indexes" in the Managing Indexers and Clusters manual. It's a good idea to create indexes that are dedicated to the collection of summary data.

10. (Optional) Under Add fields, you can add field/value pairs to the summary index definition.

These key/value pairs are annotated to each event that gets summary indexed. This makes it easier to find them with later searches. For example, you could add the name of the report populating the summary index (report=summary_firewall_top_src_ip) or the name of the index that the report populates (index=summary), and then search on those terms later.
Note: You can also add field/value pairs to the summary index configuration in savedsearches.conf. For more information, see "Configure summary indexes" in the Knowledge Manager manual.

For more information about saving searches as reports and alerts, see: "Create and edit reports" (in the Reporting Manual) and the Alerting Manual.

Schedule the populating report to avoid data gaps and overlaps

To minimize data gaps and overlaps you should be sure to set appropriate intervals and delays in the schedules of reports you use to populate summary indexes.

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can occur if:

  • The summary-index-populating report takes too long to run and runs past the next scheduled run time. For example, if you were to schedule the report that populates the summary to run every 5 minutes when that report typically takes around 7 minutes to run, you would have problems, because the search won't run again when it's still running a preceding report.
  • You have forced the summary-index-populating report to use real-time scheduling. You do this by mistakenly changing the report definition in savedsearches.conf so that the realtime_schedule attribute is set to 1, enabling real-time scheduling. This setting can result in data collection gaps if you are concurrently running several reports. When you define a summary-index-populating scheduled report in Splunk Web by selecting Enable for summary indexing and saving the report, realtime_schedule is set to 0 to ensure that the report never skips a scheduled run. For more information see "Configure the priority of scheduled reports", in the Reporting Manual.
  • splunkd goes down. If Splunk Enterprise can't index events, you will have gaps in your summary indexes.

Overlaps are events in a summary index (from the same report) that share the same timestamp. Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if you set the time range of a report to be longer than the frequency of the schedule of the report. In other words, don't arrange for a report that runs hourly to gather data for the past 90 minutes.

Note: For information about detecting and fixing overlapping data and gaps in data, see "Manage summary index gaps and overlaps" in this manual.

How summary indexing works

In Splunk Web, summary indexing is an alert option for scheduled saved searches. When you run a saved search with summary indexing turned on, its search results are temporarily stored in a file ($SPLUNK_HOME/var/spool/splunk/<savedsearch_name>_<random-number>.stash). From the file, Splunk software uses the addinfo command to add general information about the current search and the fields you specify during configuration to each result. Splunk Enterprise then indexes the resulting event data in the summary index that you've designated for it (index=summary by default).

Note: Use the addinfo command to add fields containing general information about the current search to the search results going into a summary index. General information added about the search helps you run reports on results you place in a summary index.

Summary indexing of data without timestamps

To set the time for summary index events, Splunk software uses the following information, in this order of precedence:

1. The _time value of the event being summarized.

2. The earliest (or minimum) time of the scheduled search that populates the summary index. For example, if the summary-index-populating search covers the two minutes preceding each launch of its search, its earliest time is -2m.

3. The current system time (in the case of an "all time" search, where no "earliest" value is specified)

In the majority of cases, your events will have timestamps, so the first method of discerning the summary index timestamp holds. But if you are summarizing data that doesn't contain an _time field (such as data from a lookup), the resulting events will have the timestamp of the earliest time of the summary-index-populating search.

For example, if you summarize the lookup "asset_table" every night at midnight, and the asset table does not contain an _time column, tonight's summary will have an _time value equal to the earliest time of the search. If I have set the time range of the search to be between -24h and +0s, each summarized event will have an _time value of now()-86400 (that's the start time of the search minus 86,400 seconds, or 24 hours). This means that every event without an _time field value that is found by this summary-index-populating search will be given the exact same _time value: the search's earliest time.

The best practice for summarizing data without a time stamp is to manually create an _time value as part of your search. Following on from the example above:

|inputlookup asset_table | eval _time=now()

Fields added to summary-indexed data by the si- summary indexing commands

Caution: Use of these fields and their encoded data by any search commands other than the si* summary indexing commands is unsupported. The format and content of these fields can change at any time without warning.

When you run searches with the si* commands in order to populate a summary index, Splunk software adds a set of special fields to the summary index data that all begin with psrsvd, such as psrsvd_ct_bytes and psrsvd_v and so on. When you run a search against the summary index with reporting commands like chart, timechart, and stats, the psrsvd* fields are used to calculate results for tables and charts that are statistically correct. psrsvd stands for "prestats reserved."

Most psrsvd types present information about a specific field in the original (pre-summary indexing) file in the dataset, altough some psrsvd types are not scoped to a single field. The general pattern is psrsvd_[type]_[fieldname]. For example, psrsvd_ct_bytes presents count information for the bytes field.

Here is a list of the available psrsvd types:

  • ct = count
  • gc = group count (the count for a stats "grouping," not scoped to a single field.
  • nc = numerical count (number of numerical values)
  • nn = minimum numerical value
  • nx = maximum numerical value
  • rd = rdigest of values (values a the number of times they appear)
  • sm = sum
  • sn = minimum lexicographical value
  • ss = sum of squares
  • sx = maximum lexicographical value
  • v = version (not scoped to a single field)
  • vm = value map (all distinct values for the field and the number of times they appear)
  • vt = value type (contains the precision of the associated field)
Accelerate data models
Manage summary index gaps

This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.1, 6.1.1, 6.1.2, 6.1.3, 6.1.4, 6.1.5, 6.1.6, 6.1.7, 6.1.8, 6.1.9, 6.1.10, 6.2.0, 6.2.1, 6.2.2, 6.2.3, 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.2.8, 6.2.9, 6.2.10, 6.3.0, 6.3.1, 6.3.1511, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.4.0, 6.4.1 View the Article History for its revisions.


@Spammenot66: The si- commands were considered to be an improvement over the earlier method of summary indexing, which required users to carefully set up the index-populating search so that it provides statistically correct results. The si- commands make the necessary calculations for you so you do not have to. This is discussed in this topic here: http://docs.splunk.com/Documentation/Splunk/6.3.3/Knowledge/Usesummaryindexing#Define_index-populating_searches_without_the_special_commands

You'll find a link in that topic to another topic that covers the "old" way of setting up summary index searches, and if you prefer going that route, go ahead! However, I should add that "tscollect" is not a summary indexing command but rather a report acceleration command (when used with tstats). http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutsummaryindexing

Regarding your question about the video--I'll look into it and see what I can do. It's an old video (2008).

Mness, Splunker
March 16, 2016

Is there a reason, this covers only si- summary indexing? There are other methods of collecting data into the summary index which includes si-, tscollect and collect? Is there a benefit to one over the other? For me, i tend to like collect because its saving the exact output into the summary index and is easy for computing over time for aggregated reporting.

March 16, 2016

The video for "Splunk Enterprise developer video" is split into three parts but at the end of each part, it seems like its cut right in the middle of a sentence. This occurs at the end of each part of the video. Can someone put up the full version?

March 16, 2016

Summary indexing does not count against your license. A note about this has been added to the first section of this topic.

Mness, Splunker
May 28, 2015

Does index summarization affect license volume?<br />I have found some divergent informations in answers forum.

July 31, 2014

I've updated this topic to reflect Rsennett's concerns--the "Set up summary index searches in Splunk Web" procedure is now broken down a bit further and formatted better. It's more readable now, and better explains the steps involved. Rsennett pointed out a few changes to the way the procedure works that must have been put in place since it was last updated, such as the fact that selecting "Enable" for summary indexing automatically sets certain alerting values. Thanks for that feedback!

May 28, 2014

Set up summary index searches in Splunk Web -<br />Once you are about to select an index, if it hasn't been created you've got to go do that. Mentioning multiple summary indexes at the beginning of the topic will avoid having to go back and re-do the settings if that's the intention.<br />Also. the "Note" regarding entering an index that doesn't exist is impossible via the GUI (and the topic is "Setting up summary index... in Splunk Web" so the note is meaningless in context. Adjusting the note to mention the conf files or CLI would make more sense as that's the only way to enter an index that doesn't exist...

Rsennett splunk
May 26, 2014

"Set up summary index searches in Splunk Web" section needs a look see.<br />Navigation (item 1.) mentions the System menu, should be Settings - also "Searches and Reports" is now labeled "searches, reports and alerts"<br /><br />This is a "step by step" so perhaps explaining that you MUST click Schedule in order to get the rest of the choices, and then since this is a "Summary" tutorial of sorts... the next step is to click "Summary Index" that will automagically set the "Alert" items. We can explain them... but it isn't necessary to have someone make the individual selections at this point... explain them, but avoid confusion by allowing Splunk to set it up for them.

Rsennett splunk
May 26, 2014

I think we should talk about how SI indexes is actually created in a workflow format. For example (don't quote me, I could be wrong):<br /><br />1) Create a summary search, schedule it to an index<br />1a) Schedule the summary search with a shorter time range than the frequency of the search and a larger sample that uses the summary <br />2) SH creates $SPLUNK_HOME/var/spool/splunk/_.stash temporary<br />3) The SH is configured somehow to act as forwarder and sinkholes/batch the stash file to the IDXs, which indexes it to the specified summary index<br />4) User runs a search against the summary and the IDX reads the summary index like any other search

Skawasaki splunk
January 29, 2014

Was this documentation topic helpful?

If you'd like to hear back from us, please provide your email address:

We'd love to hear what you think about this topic or the documentation as a whole
Feedback you enter here will be delivered to the documentation team

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters