Roll up metrics data for faster search performance and increased storage capacity
If you have high-volume metrics that index large numbers of unique metric data points at a fast rate, you are probably concerned about issues like storage capacity for historical metrics data and the slow performance of searches across those large datasets.
A metric rollup policy can help you with these issues. You apply metric rollup policies to metric indexes with high-volume metrics. A metric rollup policy sets rules for the aggregation and summarization of the metrics on those indexes. The resulting metric rollup summaries are created in one or more target metric indexes. The rollup summaries contain metric data points that are aggregations of the raw metric data points in the source index. The summarized metrics take up less disk space and are faster to search than the orginal metrics.
You can create metric rollup policies through Splunk Web, by adding or updating configurations in metric_rollups.conf
, and by using the catalog/metricstore/rollup
REST API endpoint. See the following topics:
- Create and edit metric rollup policies with Splunk Web
- Manage metric rollup policies with configuration files
- Metrics Catalog endpoint descriptions in the REST API Reference Manual.
Index prerequisites for metric rollup policies
If you want to define a metric rollup policy, you must identify a source metrics index and one or more target metrics indexes. The source index holds the raw metrics that you want the metric rollup policy to summarize. The target index or indexes are where the rollup summaries are stored.
You can designate a source index as a target index if there is space on it for the summaries. However, colocating your source and target indexes on the same device might reduce your ability to get increased data storage benefits from the feature.
If target indexes for your metric rollup policy do not already exist, you must create them. See Create metrics indexes in Managing Indexers and Clusters of Indexers.
Using metric rollup summaries with distributed search
The background searches that populate the rollup summaries operate on the search head. This means that they require that the source index and the target indexes be discoverable on the search head. If you use distributed search, your indexes are all on the indexer tier and are not discoverable on the search head.
You can work around this by creating stand-in source and target indexes on the search head tier. As long as the stand-in indexes have the same names as the actual indexes on the indexer tier, the Splunk software applies any rollup policies you create for the stand-in indexes to the actual indexes.
If you use distributed search you also need to arrange to have the stand-in index on the search head forward its summary data to the actual index on the indexers. You can do this by setting up a universal forwarder configuration on the search head that uses a whitelist to filter out all other indexes. This enables it to forward the metric rollup summary data to the actual target metrics index on the indexer tier. See the following topics:
- Best practice: Forward search head data to the indexer layer in Distributed Search.
- Filter data by target index in Forwarding Data.
Anatomy of a metric rollup policy
If you have a source metrics index that contains high-volume metrics, you can create a metric rollup policy for it. The source metrics index must be discoverable on a search head. See Index prerequisites for metric rollup policies.
Metric rollup policy requirements
At a minimum, a metric rollup policy determines:
- How many rollup summaries are created for the raw metrics in its source index.
- Which target indexes the summaries are stored in.
- The periods of the scheduled searches that generate the aggregated metric data points for the rollup summaries.
- The default aggregation function used for the summarization of the raw metrics.
The following table defines the required components of a metric rollup policy.
Item | Description |
---|---|
One or more rollup summary definitions | Rollup summary definitions determine where and how the search heads create rollup summaries. |
A default aggregation function | This is the default function the search head uses to aggregate metric data points from the source index when it generates rollup summaries. If you do not define an aggregation function, or if you create your metric rollup policy through Splunk Web, the search head uses the avg function. The other eligible functions are count , max , median , min , perc<int> , and sum .
|
Each rollup summary definition breaks down further into two parts: a target metric index name and a timespan.
Component | Description |
---|---|
Target metric index name | This is the index that the metric rollup summary will be created on. You must create the target metric index if it does not already exist. It must be discoverable on a search head. See Index prerequisites for metric rollup policies. |
Timespan | This sets the period of the scheduled searches that generate the metric rollup summary. It must be indicated with relative time syntax, such as 1h for one hour or 20m for twenty minutes. You might run into search concurrency issues if you set the timespan below 60 seconds.
|
Metric rollup policy options
A metric rollup policy can optionally include a dimension filter and one or more exception rules. The following table describes these optional components.
Item | Description |
---|---|
Dimension filter | You can indicate a set of dimensions that must be included or excluded from the rolled-up metrics in the summaries produced by the policy. Included dimensions are the only dimensions in the rolled-up metrics that come from the source metric data points. Excluded dimensions are the only dimensions from the source metric data points that do not appear in the rolled-up metrics. |
Aggregation exception rules | Create exception rules for metrics that require different aggregation functions than the majority of the metrics in the rollup policy. For example, when your default aggregation is <avg> , you might have specific metrics that should instead be aggregated with functions like count or perc<int> . The other eligible functions are max , median , min , and sum .
|
How metric rollup summaries are generated
A metric rollup summary is built from the results of a single saved search that can include multiple subsearches. This search runs on a schedule determined by the timespan component of the rollup summary definition. It aggregates sets of raw metric data points using the default aggregation function, or whatever exception aggregation functions might be defined for certain metrics. The search strips out any dimensions that are not in the dimension filter, if one is defined for the summarization policy.
The summary-creating search spawns a separate subsearch for each group of metrics in the source index that have the same aggregate functions and dimension sets.
The search head gives new metric names to the aggregated metric data points produced by the summary-creating search. The new metric names follow this naming convention: <raw_metric_name>_mrollup_<aggregate_function>_<timespan_in_seconds>
.
The summary-creating search also adds three new fields to each rolled up metric data point.
Field name | Description |
---|---|
rollup_source_index
|
The name of the source index |
rollup_span
|
The period of the scheduled search that generated the rollup summary that this metric data point belongs to |
rollup_aggregate
|
The function used in the creation of this aggregate data point |
Metric rollup summary generation example
Say you have a metric rollup policy on a source index named HomeIndex
. The details of this metric rollup policy are as follows:
- It has a rollup summary definition that names
SumIndex
as its target index and provides1h
as the period of its background scheduled searches. - It was created through Splunk Web, so it uses
<avg>
as its default aggregation method. - It has a dimension filter that includes only these three dimensions:
ip
,app
, andregion
. This means that the policy only rolls up metrics inHomeIndex
that include one or more of these dimensions, and that the policy strips out all other dimensions from the aggregated metric data points that it creates for the metric rollup summary. - It has an exception rule for a metric named
Metric_C
. This rule says that this metric is to be aggregated with themax
function when the search head creates rollup metric data points for it.
After you save this policy, a summary-creating search begins running in the background on an hourly schedule. When the search runs, it spawns a subsearch for each metric on HomeIndex
that has the included dimensions among its dimension sets. These subsearches produce a single aggregate metric data point each time they run. This means that if an eligible metric has 75 data points indexed over the past hour on HomeIndex
, those 75 data points are aggregated into a single metric data point by the rollup search job.
All of these aggregate metric data points are stored on SumIndex
. Each point is an aggregation of the metric data points that came in over the past hour for an eligible HomeIndex
metric. The background search gives the SumIndex
summary metric data points new metric names that reflect their origins, but which also clearly identify them as rollup metric data points.
To continue the example, let us say that on HomeIndex
, you have three metrics: metric_A
, metric_B
, and metric_C
. They have different combinations of dimensions, and metric_C
has the exception rule which requires that its metric data points be aggregated differently than the others. The following table describes these metrics in terms of the dimensions they contain, the function used for their aggregation, and the metric_name
their rolled up metric data points are given.
metric_name on source index | Includes ip dimension? | Includes app dimension? | Includes region dimension? | Aggregation function | metric_name on target index |
---|---|---|---|---|---|
metric_A
|
Yes | Yes | Yes | avg (default)
|
metric_A_mrollup_avg_3600s
|
metric_B
|
No | No | No | n/a | Not summarized because it lacks the required dimensions. |
metric_C
|
Yes | No | Yes | max (exception rule)
|
metric_C_mrollup_max_3600s
|
The data points for a metric are rolled up by the rollup summary search as long as they all share the same combination of included dimensions. In the previous example, all of the data points for metric_C get rolled up because they all have ip and region. But if the some of the data points belonging to a metric have an included dimension while other data points belonging to that metric lack that included dimension, none of the data points for that metric get rolled up.
Later, you can search SumIndex
in exactly the same way that you currently search HomeIndex
. You can run faster searches over longer periods of time because the searches are running across smaller sets of metric data points that only have one to three dimension fields.
You can also arrange to store the metrics in SumIndex
for longer periods of time than you might store their corresponding metrics on HomeIndex
because they take up less space on disk.
Set up ingest-time log-to-metrics conversion with configuration files | Create and edit metric rollup policies with Splunk Web |
This documentation applies to the following versions of Splunk® Enterprise: 7.3.0
Feedback submitted, thanks!