Splunk® Enterprise

Knowledge Manager Manual

Download manual as PDF

Splunk version 4.x reached its End of Life on October 1, 2013. Please see the migration information.
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Configure summary indexes

For a general overview of summary indexing and instructions for setting up summary indexing through Splunk Web, see the topic "Use summary indexing for increased reporting efficiency" in the Knowledge Manager manual.

You can't manually configure a summary index for a search in savedsearches.conf until the search is saved, scheduled, and has the Enable summary indexing alert option is selected.

In addition, you need to enter the name of the summary index search that the search will populate. You do this through the saved search dialog after selecting Enable summary indexing. The Summary index is the default summary index (the index that Splunk uses if you do not indicate another one).

If you plan to run a variety of summary index searches you may need to create additional summary indexes. For information about creating new indexes, see "Set up multiple indexes" in the Admin manual. It's a good idea to create indexes that are dedicated to the collection of summary data.

Note: If you enter the name of an index that does not exist, Splunk will run the search on the schedule you've defined, but its data will not get saved to a summary index.

For more information about saving, scheduling, and setting up alerts for searches, see "Save searches and share search results" and "Create an alert" in the User manual.

Note: When you define the search that you'll use to build your index, most of the time you should use the summary indexing reporting commands in the search that you use to build your summary index. These commands are prefixed with "si-": sichart, sitimechart, sistats, sitop, and sirare. The searches you create with them should be versions of the search that you'll eventually use to query the completed summary index.

The summary index reporting commands automatically take into account the issues that are covered in "Considerations for summary index search definition" below, such as scheduling shorter time ranges for the populating search, and setting the populating search to take a larger sample. You only have to worry about these issues if the search that you are using to build your index does not include summary index reporting commands.

If you do not use the summary index reporting commands, you can use the addinfo and collect search commands to create a search that Splunk saves and schedules, and which populates a pre-created summary index. For more information about that method, see "Manually populate the summary index" in this topic.

Customize summary indexing for a saved, scheduled search

When you use Splunk Web to enable summary indexing for a saved, scheduled, summary-index-enabled search, Splunk automatically generates a stanza in $SPLUNK_HOME/etc/system/local/savedsearches.conf. You can customize summary indexing for the search by editing this stanza.

If you've used Splunk Web to save and schedule a search, but haven't used Splunk Web to enable the summary index for the search, you can easily enable summary indexing for the saved search through savedsearches.conf as long as you have a new index for it to populate. For more information about manual index configuration, see, see the topic "About managing indexes" in the Admin manual.

[ <name> ]
action.summary_index = 0 | 1
action.summary_index._name = <index>
action.summary_index.<field> = <value>
  • [<name>]: Splunk names the stanza based on the name of the saved and scheduled search that you enabled for summary indexing.
  • action.summary_index = 0 | 1: Set to 1 to enable summary indexing. Set to 0 to disable summary indexing.
  • action.summary_index._name = <index> - This displays the name of the summary index populated by the search. If you've created a specific summary index for this search, enter its name in <index>. Defaults to summary, the summary index that is delivered with Splunk.
  • action.summary_index.<field> = <value>: Specify a field/value pair to add to every event that gets summary indexed by this search. You can define multiple field/value pairs for a single summary index search.

This field/value pair acts as a "tag" of sorts that makes it easier for you to identify the events that go into the summary index when you are performing searches amongst the greater population of event data. This key is optional but we recommend that you never set up a summary index without at least one field/value pair.

For example, add the name of the saved search that is populating the summary index (action.summary_index.report = summary_firewall_top_src_ip), or the name of the index that the search populates (action.summary_index.index = search).

Search commands useful to summary indexing

Summary indexing utilizes of a set of specialized reporting commands which you need to use if you are manually creating your summary indexes without the help of the Splunk Web interface or the summary indexing reporting commands.

  • addinfo: Summary indexing uses addinfo to add fields containing general information about the current search to the search results going into a summary index. Add | addinfo to any search to see what results will look like if they are indexed into a summary index.
  • collect: Summary indexing uses collect to index search results into the summary index. Use | collect to index any search results into another index (using collect command options).
  • overlap: Use overlap to identify gaps and overlaps in a summary index. overlap finds events of the same query_id in a summary index with overlapping timestamp values or identifies periods of time where there are missing events.

Manually configure a search to populate a summary index

If you want to configure summary indexing without using the search options dialog in Splunk Web and the summary indexing reporting commands, you must first configure a summary index just like you would any other index via indexes.conf. For more information about manual index configuration, see, see the topic "About managing indexes" in this manual.

Important: You must restart Splunk for changes in indexes.conf to take effect.

1. Run a search that you want to summarize results from in the Splunk Web search bar.

  • Be sure to limit the time range of your search. The number of results that your search generates needs to fit within the maximum search result limits you have set for searching.
  • Make sure to choose a time interval that works for your data, such as 10 minutes, 2 hours, or 1 day. (For more information about using Splunk Web to schedule search intervals, see "Create an alert" in the User Manual.)

2. Use the addinfo search command. Append | addinfo to the end of your search.

  • This command adds information about the search to events that the collect command requires in order to place them into a summary index.
  • You can always add | addinfo to any search to preview what the results of a search will look like in a summary index.

3. Add the collect search command. Append |collect index=<index_name> addtime=t marker="info_search_name=\"<summary_search_name>\"" to the end of the search.

  • Replace index_name with the name of the summary index
  • Replace summary_search_name with a key to find the results of this search in the index.
  • A summary_search_name *must* be set if you wish to use the overlap search command on the generated events.

Note: For the general case we recommend that you use the provided summary_index alert action. Configuring via addinfo and collect requires some redundant steps that are not needed when you generate summary index events from scheduled searches. Manual configuration remains necessary when you backfill a summary index for timeranges which have already transpired.

Considerations for summary index search definition

If for some reason you're going to set up a summary-index-populating search that does not use the summary indexing reporting commands, you should take a few moments to plan out your approach. With summary indexing, the egg comes before the chicken. Use the search that you actually want to report on to help define the search you use to populate the summary index.

Many summary searches involve aggregated statistics--for example, a report where you are searching for the top 10 ip addresses associated with firewall offenses over the past day--when the main index accrues millions of events per day.

If you populate the summary index with the results of the same search that you run on the summary index, you'll likely get results that are statistically inaccurate. You should follow these rules when defining the search that populates your summary index to improve the accuracy of aggregated statistics generated from summary index searches.

Schedule a shorter time range for the populating search

The search that populates your summary index should be scheduled on a shorter (and therefore more frequent) interval than that of the search that you eventually run against the index. You should go for the smallest time range possible. For example, if you need to generate a daily "top" report, then the report populating the summary index should take its sample on an hourly basis.

Set the populating search to take a larger sample

The search populating the summary index should seek out a significantly larger sample than the search that you want to run on the summary index. So, for example, if you plan to search the summary index for the daily top 10 offending ip addresses, you would set up a search to populate the summary index with the hourly top 100 offending ip addresses.

This approach has two benefits--it ensures a higher amount of statistical accuracy for the top 10 report (due to the larger and more-frequently-taken overall sample) and it gives you a bit of wiggle room if you decide you'd rather report on the top 20 or 30 offending ips.

The summary indexing reporting commands automatically take a sample that is larger than the search that you'll run to query the completed summary index, thus creating summary indexes with event data that is not incorrectly skewed. If you do not use those commands, you can use the head command to to select a larger sample for the summary-index-populating search than the search that you run on the summary index. In other words, you would have | head=100 for the hourly summary index populating search, and | head=10 for the daily search of the completed summary index.

Set up your search to get a weighted average

If your summary-index-populating search involves averages, and you are not using the summary indexing reporting commands, you need to set that search up to get a weighted average.

For example, say you want to build hourly, daily, or weekly reports of average response times. To do this, you'd generate the "daily average" by averaging the "hourly averages" together. Unfortunately, the daily average becomes skewed if there aren't the same number of events in each "hourly average". You can get the correct "daily average" by using a weighted average function.

The following expression calculates the daily average response time correctly with a weighted average by using the stats and eval commands in conjunction with the sum statistical aggregator. In this example, the eval command creates a daily_average field, which is the result of dividing the average response time sum by the average response time count.

| stats sum(hourly_resp_time_sum) as resp_time_sum, sum(hourly_resp_time_count) as resp_time_count | eval daily_average= resp_time_sum/resp_time_count | .....

Schedule the populating search to avoid data gaps and overlaps

Along with the above two rules, to minimize data gaps and overlaps you should also be sure to set appropriate intervals and delays in the schedules of searches you use to populate summary indexes.

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can occur if:

  • splunkd goes down.
  • the scheduled saved search (the one being summary indexed) takes too long to run and runs past the next scheduled run time. For example, if you were to schedule the search that populates the summary to run every 5 minutes when that search typically takes around 7 minutes to run, you would have problems, because the search won't run again when it's still running a preceding search.

Overlaps are events in a summary index (from the same search) that share the same timestamp. Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if you set the time range of a saved search to be longer than the frequency of the schedule of the search, or if you manually run summary indexing using the collect command.

Example of a summary index configuration

This example shows a configuration for a summary index of Apache server statistics as it might appear in savedsearches.conf. The keys listed below enable summary indexing for the saved search "Apache Method Summary."

Note: If you set action_summary.index=1, you don't need to have the addinfo or collect commands in the search.

#name of the saved search = Apache Method Summary
[Apache Method Summary]
# sets the search to run at each search interval
counttype = always
# enable the search schedule
enableSched = 1
# search interval in cron notation (this means "every 5 minutes")
schedule = */5 * * * *
# id of user for saved search
userid = jsmith
# search string for summary index
search = index=apache_raw startminutesago=30 endminutesago=25 | extract auto=false | stats count by method
# enable summary indexing
action.summary_index = 1
#name of summary index to which search results are added
action.summary_index._name = summary   
# add these keys to each event
action.summary_index.report = "count by method"

Other configuration files affected by summary indexing

In addition to the settings you configure in savedsearches.conf, there are also settings for summary indexing in indexes.conf and alert_actions.conf.

Indexes.conf specifies index configuration for the summary index. Alert_actions.conf controls the alert actions (including summary indexing) associated with saved searches.

Caution: Do not edit settings in alert_actions.conf without explicit instructions from Splunk staff.

PREVIOUS
Manage summary index gaps and overlaps
 

This documentation applies to the following versions of Splunk® Enterprise: 4.3, 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6, 4.3.7


Comments

Further check Splunk document and found this:<br /><br />schedule = <br />* This field is DEPRECATED.<br /><br />Not sure why this place is still using the deprecated field as example.

Tonopahtaos
October 4, 2012

It seems:<br /><br />cron_schedule = */5 * * * * --> every 5 minutes<br />schedule = */12 * * * * --> every 5 minutes<br /><br />Not sure why Splunk is using two different notations for scheduling.

Tonopahtaos
October 3, 2012

One thing to clarify to my comment sent a few minutes ago: this is based on my testing against Splunk 4.3.3. I know it is different from cron definition. But i do not know why Splunk behavior likes this.

Tonopahtaos
October 2, 2012

*/12 * * * * is for every 5 minutes. The comments from Edditor is wrong. Please change it back.

Tonopahtaos
October 2, 2012

Thanks for pointing that out, we've made the correction.

Cgales splunk, Splunker
September 6, 2012

Small error on this page:<br /># search interval in cron notation (this means "every 5 minutes")<br />schedule = */12****<br /><br />Should be:<br />schedule = */5****

Eddit0r
September 5, 2012

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters