Design searches that populate summary events indexes
This topic does not apply to summary metrics indexes.
Splunk administrators typically decide to create a summary index when they have a transforming search that tends to complete slowly. This happens because it has to run over a large dataset over a long range of time, often to pick out a small slice of that data.
You fix this not by changing the search, but by changing the source of the data. Instead of running that search over a huge and varied index, you instead run it over a summary index that contains only those events (or metrics, if you have created a summary metrics index) that are relevant to the search.
If your intent is to create a summary events index, you need to design another search that is identical to the original search, but which replaces the ordinary transforming command in the search (such as stats, chart, or timechart) with a command form the
si* family of summary indexing transforming commands:
Why use the
si* commands? The
si* commands perform a bit of extra work to ensure that the summary index returns statistically accurate results for the searches you run against it. If you decide not to use the
si* commands you need to manually calibrate the search to ensure that it is sampling the correct amount of data (and calculating weighted averages, if the search involves averages). For more information about setting up summary indexes the hard way, see Configure summary events indexes.
For an overview of summary indexing, see Use summary indexing for increased search efficiency.
Example of a search for the purpose of populating a summary events index
Let's say you've been running the following search over a typical events index, with a time range of one year. Furthermore, let's say that this search completes slowly because of the wide timerange and the fact that that index is a very large and varied dataset.
eventtype=firewall | top src_ip
You need to create a summary index that is composed of the top source IPs from the "firewall" event type. You can use the following search to build that summary index. You would schedule it to run on a daily basis, collecting the top
src_ip values for only the previous 24 hours each time. It adds the results of each daily search to an index named "summary_src_ip".
eventtype=firewall | sitop src_ip
Now, let's say you save this search with the name "Summary - firewall top src_ip" (all saved summary-index-populating searches should have names that identify them as such). After your summary index is populated with results, search and report against that summary index using a search that specifies the summary index and the name of the search that you used to populate it. For example, this is the search you would use to get the top source_ips over the past year:
index=summary search_name="summary - firewall top src_ip" | top src_ip
Because this search specifies the search name, it filters out other data that have been placed in the summary index by other summary indexing searches. This search should complete much faster–even with a one year time range–because it is searching over a smaller, more focused dataset.
Considerations for summary events index searches
When you create a search that will populate a summary events index with its results, there are a few things you should know.
- The search should return statistical data in a table format, and the
_rawfield should not be present in the results.
- If your summary-populating search includes the
_rawfield in its results, the Splunk software focuses on reparsing the
_rawstrings and ignores other fields associated with those strings, including
_time. Summarized data without
_timefields is difficult to search.
- You can base summary event indexes on searches that return events, but getting them to work correctly can be tricky.
- The search should not have other search operators after the transforming
- Do not include additional commands such as
eval. Save the extra search operators for the searches you plan to run against the summary index.
- The results from a summary-indexing optimized search are stored in a special format that cannot be modified before the final transformation is performed.
- If you populate a summary index with
... | sistats <args>, the only valid retrieval of the data is:
index=<summary> source=<saved search name> | stats <args>. The search against the summary index cannot create or modify fields before the
| stats <args>command.
- If you are running a search against a summary index that queries for events with a specific
- When the Splunk software gathers events into a summary events index, it changes all
stash. The Splunk software moves the original sourcetype values to
- So, instead of running a search against a summary index like
...|stats avg(ip) by sourcetype, use
...|stats avg(ip) by orig_sourcetype.
Fields added to summary-indexed data by the si* summary indexing commands
Use of these fields and their encoded data by any search commands other than the
si* summary indexing commands is unsupported. The format and content of these fields can change at any time without warning.
When you run searches with the
si* commands in order to populate a summary index, Splunk software adds a set of special fields to the summary index data that all begin with
psrsvd, such as
psrsvd_v and so on. When you run a search against the summary index with transforming commands like
psrsvd* fields are used to calculate results for tables and charts that are statistically correct.
psrsvd stands for "prestats reserved."
psrsvd types present information about a specific field in the original (pre-summary indexing) file in the dataset, although some
psrsvd types are not scoped to a single field. The general pattern is
psrsvd_[type]_[fieldname]. For example,
psrsvd_ct_bytes presents count information for the
Here is a list of the available
gc= group count (the count for a stats "grouping," not scoped to a single field.
nc= numerical count (number of numerical values)
nn= minimum numerical value
nx= maximum numerical value
rd= rdigest of values (values a the number of times they appear)
sn= minimum lexicographical value
ss= sum of squares
sx= maximum lexicographical value
v= version (not scoped to a single field)
vm= value map (all distinct values for the field and the number of times they appear)
vt= value type (contains the precision of the associated field)
Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.
- Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
- Uppercase letters are sorted before lowercase letters.
- Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.
Summary indexing of data without timestamps
To set the time for summary index events, Splunk software uses the following information, in this order of precedence:
_timevalue of the event being summarized.
- The earliest (or minimum) time of the scheduled search that populates the summary index. For example, if the summary-index-populating search covers the two minutes preceding each launch of its search, its earliest time is -2m.
- The current system time (in the case of an "all time" search, where no "earliest" value is specified).
In the majority of cases, your events will have timestamps, so the first method of discerning the summary index timestamp holds. But if you are summarizing data that doesn't contain an
_time field (such as data from a lookup), the resulting events will have the timestamp of the earliest time of the summary-index-populating search.
For example, if you summarize the lookup "asset_table" every night at midnight, and the asset table does not contain an
_time column, tonight's summary will have an
_time value equal to the earliest time of the search. If I have set the time range of the search to be between
+0s, each summarized event will have an
_time value of
now()-86400: the start time of the search minus 86,400 seconds, or 24 hours. This means that every event without an
_time field value that is found by this summary-index-populating search is given the exact same
_time value: the search's earliest time.
If you base a summary events index on a search that returns events instead of statistics, and if the
_raw field exists in those events, the summary indexing process focuses on parsing the
_raw fields and ignores the
The best practice for summarizing events without a time stamp is to have your search add a
_time value to each event:
|inputlookup asset_table | eval _time=now()
Create a summary index in Splunk Web
Manage summary index gaps
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2303, 8.2.2112, 8.2.2201, 8.2.2202, 8.2.2203, 9.0.2205, 9.0.2208, 9.0.2209 (latest FedRAMP release)
Feedback submitted, thanks!