Configure summary indexes
This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
- Schedule a shorter time range for the populating search
- Set the populating search to take a larger sample
- Set up your search to get a weighted average
- Schedule the populating search to avoid data gaps and overlaps
Configure summary indexes
For a general overview of summary indexing and instructions for setting up summary indexing through Splunk Web, see the topic "Use summary indexing for increased reporting efficiency" in the Knowledge Manager manual.
You can't manually configure a summary index for a search in
savedsearches.conf until the search is saved, scheduled, and has the Enable summary indexing alert option is selected.
In addition, you need to enter the name of the summary index search that the search will populate. You do this through the saved search dialog after selecting Enable summary indexing. The Summary index is the default summary index (the index that Splunk uses if you do not indicate another one).
If you plan to run a variety of summary index searches you may need to create additional summary indexes. For information about creating new indexes, see "Set up multiple indexes" in the Admin manual. It's a good idea to create indexes that are dedicated to the collection of summary data.
Note: If you enter the name of an index that does not exist, Splunk will run the search on the schedule you've defined, but its data will not get saved to a summary index.
Note: When you define the search that you'll use to build your index, most of the time you should use the summary indexing reporting commands in the search that you use to build your summary index. These commands are prefixed with "si-":
sirare. The searches you create with them should be versions of the search that you'll eventually use to query the completed summary index.
The summary index reporting commands automatically take into account the issues that are covered in "Considerations for summary index search definition" below, such as scheduling shorter time ranges for the populating search, and setting the populating search to take a larger sample. You only have to worry about these issues if the search that you are using to build your index does not include summary index reporting commands.
If you do not use the summary index reporting commands, you can use the
collect search commands to create a search that Splunk saves and schedules, and which populates a pre-created summary index. For more information about that method, see "Manually populate the summary index" in this topic.
Customize summary indexing for a saved, scheduled search
When you use Splunk Web to enable summary indexing for a saved, scheduled, summary-index-enabled search, Splunk automatically generates a stanza in
$SPLUNK_HOME/etc/system/local/savedsearches.conf. You can customize summary indexing for the search by editing this stanza.
If you've used Splunk Web to save and schedule a search, but haven't used Splunk Web to enable the summary index for the search, you can easily enable summary indexing for the saved search through
savedsearches.conf as long as you have a new index for it to populate. For more information about manual index configuration, see, see the topic "About managing indexes" in the Admin manual.
[ <name> ] action.summary_index = 0 | 1 action.summary_index._name = <index> action.summary_index.<field> = <value>
[<name>]: Splunk names the stanza based on the name of the saved and scheduled search that you enabled for summary indexing.
action.summary_index = 0 | 1: Set to 1 to enable summary indexing. Set to 0 to disable summary indexing.
action.summary_index._name = <index>- This displays the name of the summary index populated by the search. If you've created a specific summary index for this search, enter its name in
<index>. Defaults to
summary, the summary index that is delivered with Splunk.
action.summary_index.<field> = <value>: Specify a field/value pair to add to every event that gets summary indexed by this search. You can define multiple field/value pairs for a single summary index search.
This field/value pair acts as a "tag" of sorts that makes it easier for you to identify the events that go into the summary index when you are performing searches amongst the greater population of event data. This key is optional but we recommend that you never set up a summary index without at least one field/value pair.
For example, add the name of the saved search that is populating the summary index (
action.summary_index.report = summary_firewall_top_src_ip), or the name of the index that the search populates (
action.summary_index.index = search).
Search commands useful to summary indexing
Summary indexing utilizes of a set of specialized reporting commands which you need to use if you are manually creating your summary indexes without the help of the Splunk Web interface or the summary indexing reporting commands.
- addinfo: Summary indexing uses
addinfoto add fields containing general information about the current search to the search results going into a summary index. Add
| addinfoto any search to see what results will look like if they are indexed into a summary index.
- collect: Summary indexing uses
collectto index search results into the summary index. Use
| collectto index any search results into another index (using
- overlap: Use overlap to identify gaps and overlaps in a summary index.
overlapfinds events of the same query_id in a summary index with overlapping timestamp values or identifies periods of time where there are missing events.
Manually configure a search to populate a summary index
If you want to configure summary indexing without using the search options dialog in Splunk Web and the summary indexing reporting commands, you must first configure a summary index just like you would any other index via
indexes.conf. For more information about manual index configuration, see, see the topic "About managing indexes" in this manual.
Important: You must restart Splunk for changes in
indexes.conf to take effect.
1. Run a search that you want to summarize results from in the Splunk Web search bar.
- Be sure to limit the time range of your search. The number of results that your search generates needs to fit within the maximum search result limits you have set for searching.
- Make sure to choose a time interval that works for your data, such as 10 minutes, 2 hours, or 1 day. (For more information about using Splunk Web to schedule search intervals, see "Create an alert" in the User Manual.)
2. Use the addinfo search command. Append
| addinfo to the end of your search.
- This command adds information about the search to events that the collect command requires in order to place them into a summary index.
- You can always add
| addinfoto any search to preview what the results of a search will look like in a summary index.
3. Add the collect search command. Append
|collect index=<index_name> addtime=t marker="info_search_name=\"<summary_search_name>\"" to the end of the search.
index_namewith the name of the summary index
summary_search_namewith a key to find the results of this search in the index.
summary_search_name*must* be set if you wish to use the overlap search command on the generated events.
Note: For the general case we recommend that you use the provided summary_index alert action. Configuring via addinfo and collect requires some redundant steps that are not needed when you generate summary index events from scheduled searches. Manual configuration remains necessary when you backfill a summary index for timeranges which have already transpired.
Considerations for summary index search definition
If for some reason you're going to set up a summary-index-populating search that does not use the summary indexing reporting commands, you should take a few moments to plan out your approach. With summary indexing, the egg comes before the chicken. Use the search that you actually want to report on to help define the search you use to populate the summary index.
Many summary searches involve aggregated statistics--for example, a report where you are searching for the top 10 ip addresses associated with firewall offenses over the past day--when the main index accrues millions of events per day.
If you populate the summary index with the results of the same search that you run on the summary index, you'll likely get results that are statistically inaccurate. You should follow these rules when defining the search that populates your summary index to improve the accuracy of aggregated statistics generated from summary index searches.
Schedule a shorter time range for the populating search
The search that populates your summary index should be scheduled on a shorter (and therefore more frequent) interval than that of the search that you eventually run against the index. You should go for the smallest time range possible. For example, if you need to generate a daily "top" report, then the report populating the summary index should take its sample on an hourly basis.
Set the populating search to take a larger sample
The search populating the summary index should seek out a significantly larger sample than the search that you want to run on the summary index. So, for example, if you plan to search the summary index for the daily top 10 offending ip addresses, you would set up a search to populate the summary index with the hourly top 100 offending ip addresses.
This approach has two benefits--it ensures a higher amount of statistical accuracy for the top 10 report (due to the larger and more-frequently-taken overall sample) and it gives you a bit of wiggle room if you decide you'd rather report on the top 20 or 30 offending ips.
The summary indexing reporting commands automatically take a sample that is larger than the search that you'll run to query the completed summary index, thus creating summary indexes with event data that is not incorrectly skewed. If you do not use those commands, you can use the head command to to select a larger sample for the summary-index-populating search than the search that you run on the summary index. In other words, you would have
| head=100 for the hourly summary index populating search, and
| head=10 for the daily search of the completed summary index.
Set up your search to get a weighted average
If your summary-index-populating search involves averages, and you are not using the summary indexing reporting commands, you need to set that search up to get a weighted average.
For example, say you want to build hourly, daily, or weekly reports of average response times. To do this, you'd generate the "daily average" by averaging the "hourly averages" together. Unfortunately, the daily average becomes skewed if there aren't the same number of events in each "hourly average". You can get the correct "daily average" by using a weighted average function.
The following expression calculates the daily average response time correctly with a weighted average by using the
eval commands in conjunction with the
sum statistical aggregator. In this example, the
eval command creates a
daily_average field, which is the result of dividing the average response time sum by the average response time count.
| stats sum(hourly_resp_time_sum) as resp_time_sum, sum(hourly_resp_time_count) as resp_time_count | eval daily_average= resp_time_sum/resp_time_count | .....
Schedule the populating search to avoid data gaps and overlaps
Along with the above two rules, to minimize data gaps and overlaps you should also be sure to set appropriate intervals and delays in the schedules of searches you use to populate summary indexes.
Gaps in a summary index are periods of time when a summary index fails to index events. Gaps can occur if:
- the scheduled saved search (the one being summary indexed) takes too long to run and runs past the next scheduled run time. For example, if you were to schedule the search that populates the summary to run every 5 minutes when that search typically takes around 7 minutes to run, you would have problems, because the search won't run again when it's still running a preceding search.
Overlaps are events in a summary index (from the same search) that share the same timestamp. Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur if you set the time range of a saved search to be longer than the frequency of the schedule of the search, or if you manually run summary indexing using the collect command.
Example of a summary index configuration
This example shows a configuration for a summary index of Apache server statistics as it might appear in
savedsearches.conf. The keys listed below enable summary indexing for the saved search "Apache Method Summary."
Note: If you set
action_summary.index=1, you don't need to have the
collect commands in the search.
#name of the saved search = Apache Method Summary [Apache Method Summary] # sets the search to run at each search interval counttype = always # enable the search schedule enableSched = 1 # search interval in cron notation (this means "every 5 minutes") schedule = */5 * * * * # id of user for saved search userid = jsmith # search string for summary index search = index=apache_raw startminutesago=30 endminutesago=25 | extract auto=false | stats count by method # enable summary indexing action.summary_index = 1 #name of summary index to which search results are added action.summary_index._name = summary # add these keys to each event action.summary_index.report = "count by method"
Other configuration files affected by summary indexing
Caution: Do not edit settings in
alert_actions.conf without explicit instructions from Splunk staff.