Roll up metrics data for faster search performance and increased storage capacity

If you have high-volume metrics that index large numbers of unique metric data points at a fast rate, you are probably concerned about issues like storage capacity for historical metrics data and the slow performance of searches across those large datasets.

A metric rollup policy can help you with these issues. You apply metric rollup policies to metric indexes with high-volume metrics. A metric rollup policy sets rules for the aggregation and summarization of the metrics on those indexes. The resulting metric rollup summaries are created in one or more target metric indexes. The rollup summaries contain metric data points that are aggregations of the raw metric data points in the source index. The summarized metrics take up less disk space and are faster to search than the orginal metrics.

You can create metric rollup policies through Splunk Web, by adding or updating configurations in metric_rollups.conf, and by using the catalog/metricstore/rollup REST API endpoint.

Certain metrics rollup feature extensions, such as the ability to define multiple default aggregation functions for a rollup policy, can only be managed through manual configuration file edits or REST API operations.

See the following topics:

Index prerequisites for metric rollup policies

If you want to define a metric rollup policy, you must identify a source metrics index and one or more target metrics indexes. The source index holds the raw metrics that you want the metric rollup policy to summarize. The target index or indexes are where the rollup summaries are stored.

You can designate a source index as a target index if there is space on it for the summaries. However, colocating your source and target indexes on the same device might reduce your ability to get increased data storage benefits from the feature.

If target indexes for your metric rollup policy do not already exist, you must create them. See Create metrics indexes in Managing Indexers and Clusters of Indexers.

Using metric rollup summaries with distributed search

The background searches that populate the rollup summaries operate on the search head. This means that they require that the source index and the target indexes be discoverable on the search head. If you use distributed search, your indexes are all on the indexer tier and are not discoverable on the search head.

You can work around this by creating stand-in source and target indexes on the search head tier. As long as the stand-in indexes have the same names as the actual indexes on the indexer tier, the Splunk software applies any rollup policies you create for the stand-in indexes to the actual indexes.

If you use distributed search you also need to arrange to have the stand-in index on the search head forward its summary data to the actual index on the indexers. You can do this by setting up a universal forwarder configuration on the search head that uses a whitelist to filter out all other indexes. This enables it to forward the metric rollup summary data to the actual target metrics index on the indexer tier. See the following topics:

Best practice: Forward search head data to the indexer layer in Distributed Search.
Filter data by target index in Forwarding Data.

Anatomy of a metric rollup policy

If you have a source metrics index that contains high-volume metrics, you can create a metric rollup policy for it. The source metrics index must be discoverable on a search head. See Index prerequisites for metric rollup policies.

Metric rollup policy requirements

At a minimum, a metric rollup policy determines:

How many rollup summaries are created for the raw metrics in its source index.
Which target indexes the summaries are stored in.
The periods of the scheduled searches that generate the aggregated metric data points for the rollup summaries.
The default aggregation function used for the summarization of the raw metrics.

The following table defines the required components of a metric rollup policy.

Item	Description
One or more rollup summary definitions	Rollup summary definitions determine where and how the search heads create rollup summaries.
A default aggregation function	This is the default function the search head uses to aggregate metric data points from the source index when it generates rollup summaries. If you do not define an aggregation function, or if you create your metric rollup policy through Splunk Web, the search head uses the `avg` function. The other eligible functions are `count`, `max`, `median`, `min`, `perc<int>`, and `sum`.

Each rollup summary definition breaks down further into two parts: a target metric index name and a timespan.

Component	Description
Target metric index name	This is the index that the metric rollup summary will be created on. You must create the target metric index if it does not already exist. It must be discoverable on a search head. See Index prerequisites for metric rollup policies.
Timespan	This sets the period of the scheduled searches that generate the metric rollup summary. It must be indicated with relative time syntax, such as `1h` for one hour or `20m` for twenty minutes. You might run into search concurrency issues if you set the timespan below 60 seconds.

Metric rollup policy options

A metric rollup policy can optionally include a dimension filter and one or more exception rules. The following table describes these optional components.

Item	Description
Dimension filter	You can indicate a set of dimensions that must be included or excluded from the rolled-up metrics in the summaries produced by the policy. Included dimensions are the only dimensions in the rolled-up metrics that come from the source metric data points. Excluded dimensions are the only dimensions from the source metric data points that do not appear in the rolled-up metrics.
Aggregation exception rules	Create exception rules for metrics that require different aggregation functions than the majority of the metrics in the rollup policy. For example, when your default aggregation is `<avg>`, you might have specific metrics that should instead be aggregated with functions like `count` or `perc<int>`. The other eligible functions are `max`, `median`, `min`, and `sum`.

How metric rollup summaries are generated

A metric rollup summary is built from the results of a single saved search that can include multiple subsearches. This search runs on a schedule determined by the timespan component of the rollup summary definition. It aggregates sets of raw metric data points using the default aggregation function, or whatever exception aggregation functions might be defined for certain metrics. The search strips out any dimensions that are not in the dimension filter, if one is defined for the summarization policy.

The summary-creating search spawns a separate subsearch for each group of metrics in the source index that have the same aggregate functions and dimension sets.

The search head gives new metric names to the aggregated metric data points produced by the summary-creating search. The new metric names follow this naming convention: <raw_metric_name>_mrollup_<aggregate_function>_<timespan_in_seconds>.

The summary-creating search also adds three new fields to each rolled up metric data point.

Field name	Description
`rollup_source_index`	The name of the source index
`rollup_span`	The period of the scheduled search that generated the rollup summary that this metric data point belongs to
`rollup_aggregate`	The function used in the creation of this aggregate data point

Metric rollup summary generation example

Say you have a metric rollup policy on a source index named HomeIndex. The details of this metric rollup policy are as follows:

It has a rollup summary definition that names SumIndex as its target index and provides 1h as the period of its background scheduled searches.
It was created through Splunk Web, so it uses <avg> as its default aggregation method.
It has a dimension filter that includes only these three dimensions: ip, app, and region. This means that the policy only rolls up metrics in HomeIndex that include one or more of these dimensions, and that the policy strips out all other dimensions from the aggregated metric data points that it creates for the metric rollup summary.
It has an exception rule for a metric named Metric_C. This rule says that this metric is to be aggregated with the max function when the search head creates rollup metric data points for it.

After you save this policy, a summary-creating search begins running in the background on an hourly schedule. When the search runs, it spawns a subsearch for each metric on HomeIndex that has the included dimensions among its dimension sets. These subsearches produce a single aggregate metric data point each time they run. This means that if an eligible metric has 75 data points indexed over the past hour on HomeIndex, those 75 data points are aggregated into a single metric data point by the rollup search job.

All of these aggregate metric data points are stored on SumIndex. Each point is an aggregation of the metric data points that came in over the past hour for an eligible HomeIndex metric. The background search gives the SumIndex summary metric data points new metric names that reflect their origins, but which also clearly identify them as rollup metric data points.

To continue the example, let us say that on HomeIndex, you have three metrics: metric_A, metric_B, and metric_C. They have different combinations of dimensions, and metric_C has the exception rule which requires that its metric data points be aggregated differently than the others. The following table describes these metrics in terms of the dimensions they contain, the function used for their aggregation, and the metric_name their rolled up metric data points are given.

`metric_name` on source index	Includes `ip` dimension?	Includes `app` dimension?	Includes `region` dimension?	Aggregation function	`metric_name` on target index
`metric_A`	Yes	Yes	Yes	`avg` (default)	`metric_A_mrollup_avg_3600s`
`metric_B`	No	No	No	n/a	Not summarized because it lacks the required dimensions.
`metric_C`	Yes	No	Yes	`max` (exception rule)	`metric_C_mrollup_max_3600s`

The data points for a metric are rolled up by the rollup summary search as long as they all share the same combination of included dimensions. In the previous example, all of the data points for metric_C get rolled up because they all have ip and region. But if the some of the data points belonging to a metric have an included dimension while other data points belonging to that metric lack that included dimension, none of the data points for that metric get rolled up.

Later, you can search SumIndex in exactly the same way that you currently search HomeIndex. You can run faster searches over longer periods of time because the searches are running across smaller sets of metric data points that only have one to three dimension fields.

You can also arrange to store the metrics in SumIndex for longer periods of time than you might store their corresponding metrics on HomeIndex because they take up less space on disk.

Related answers from Splunk Community

Roll up metrics data for faster search performance and increased storage capacity

Index prerequisites for metric rollup policies

Using metric rollup summaries with distributed search

Anatomy of a metric rollup policy

Metric rollup policy requirements

Metric rollup policy options

How metric rollup summaries are generated

Metric rollup summary generation example

Comments

Roll up metrics data for faster search performance and increased storage capacity

Was this topic useful?