Introduction to metrics pipeline management 🔗
Available in Enterprise Edition
Metrics pipeline management (MPM) is an evolution of the Splunk Observability Cloud metrics platform that offers you solutions to centrally manage metric cardinality.
With MPM, you have more control over how you ingest and store your metrics, so you can lower costs and improve monitoring performance without updating the configuration of your instance of the Splunk Distribution of the OpenTelemetry Collector. To remove data pre-ingest using the Collector, see Control data to ingest using the Collector.
What is metric cardinality? 🔗
Metric cardinality is the number of unique metric time series (MTS) produced by a combination of metric name and its associated dimensions. Therefore, a metric has high cardinality when it has a high number of dimension keys, and a high number of possible unique values for those dimension keys.
For example, you send in data for a metric
http.server.durationhas only 1 dimension
endpointwith 3 unique values:
http.server.durationgenerates 3 metric time series (MTS).
If you add another dimension
regionwith 3 unique values:
http.server.durationgenerates 3 (endpoints) * 3 (regions) = 9 MTS.
http.server.duration only has 2 dimensions, metric cardinality is already 9 since each dimension has
multiple possible values.
How does metrics pipeline management work? 🔗
With MPM, for each metric you send to Splunk Observability Cloud you can control the metric volume with aggregation and data dropping rules:
Aggregation rules let you roll up your selected metric data into new metrics that take up less storage and increase computational performance. To learn more, see Aggregation rules.
Data dropping rules let you discard any metrics you don’t want to retain for monitoring. To learn more, see Data dropping rules.
By aggregating combinations of dimensions that provide useful insights while dropping a large amount of the unaggregated raw data, you can significantly reduce your organization’s data footprint.
Aggregation rollup period 🔗
A new aggregated MTS has a resolution of 10 seconds. MPM rolls up the raw data points received into one aggregated data point for each MTS associated with the metric.
If your systems emit data points over a period that’s much longer than 10 seconds, you might have difficulty reconciling your raw data with the aggregated data. To learn more about the aggregation period and how to mitigate it, see the section MTS aggregation rollup period.
Aggregation rules 🔗
Data you send from your services to Splunk Observability Cloud can have high cardinality. Instead of adjusting how you are sending in your data before you send it, aggregation lets you summarize your data in Splunk Observability Cloud based on the dimensions you consider important.
By selecting specific dimensions to keep, you can aggregate your data points into a new metric with fewer dimensions, creating a specific view of dimensions that are important. You can then obtain a more simplified and concentrated view of your data when you don’t need to view metrics across all dimensions.
When you select specific dimensions, metrics pipeline management generates a new metric. The system creates new MTS
based on the dimensions you select and rolls up data points for each MTS. By default, aggregation rules roll up the
data points into the new MTS using
You can use the new aggregated MTS in the same way as any other MTS in Observability Cloud.
How is this different from post-ingestion aggregation at query time? 🔗
When you configure charts or detectors, you can aggregate your data using analytic functions, such as
sum, and then
group your data by specific dimensions, such as
sum by region. This aggregation occurs after Observability Cloud
has stored your raw MTS, so you still pay for storing the data.
With metrics pipeline management, you can aggregate your MTS as you store it and retain only aggregated metrics. Since you’re storing fewer dimensions for each data point, and metrics pipeline management roles up the metric values, you save storage costs.
You send a metric called
http.server.duration for a containerized workload using Splunk Infrastructure Monitoring.
Your workload has 10 endpoints, 20 regions, 5 services, and 10,000 containers. Each of the 5 services has 10,000 containers and 10 endpoints.
Your data is coming in at the container ID level, generating 10 (endpoints) * 5 (services) * 20 (regions) * 10,000 (containers) = 1,000,000 MTS.
You can reduce your metric cardinality by aggregating one or multiple dimensions.
Aggregate using one dimension 🔗
You are only interested in the source region of your data, so you create an aggregation rule that groups your data by
The aggregated metric removes all other dimensions and retains only the
region dimension based on your rule. There
are only 20 different values for
region, so only Observability Cloud only ingests 20 MTS.
Aggregate using multiple dimensions 🔗
You want to continue monitoring endpoints, regions, and services for your data, but don’t need to monitor container IDs. You create an aggregation rule that groups your data by the dimensions you want to keep.
The aggregated metric removes the
container_id dimension and retains
based on your rule. Your new metric volume is: 10 (endpoints) * 20 (regions) * 5 (services) = 1,000 MTS.
Data dropping rules 🔗
When you have a new aggregated metric, you might no longer need the original unaggregated data. You can also drop a metric without adding an aggregation rule. Data dropping rules let you discard any data you don’t want to monitor, so you can save storage space and reduce cardinality.
You must be an admin to drop data.
You can drop new incoming data, but you can’t drop data that Splunk Observability Cloud has already ingested.
You can’t recover dropped data. Before you drop data, see Impact and benefits of dropping data.
Once you have new aggregated metrics created by aggregation rules, you can drop the raw unaggregated data for
MTS aggregation rollup period 🔗
If your systems send periodic data points, but the period is longer than 10 seconds, then the result of MTS aggregation might not be what you expect.
For example, suppose your systems generate data points every 5 seconds. Two successive data points have timestamps that differ by 5 seconds. If your systems immediately transmit the points to Observability Cloud, the system ingests two data points every 10 seconds. Metrics pipeline management can roll up the two data points into one aggregated data point with a resolution of 10 seconds, which is the result you expect.
If you are sending data points, but they don’t always arrive with the same frequency, Observability Cloud might receive two data points in the first 10 seconds, then twelve data points in the next 10 seconds. In both cases, metrics pipeline management rolls up the raw points into a single aggregated data point.
Also, if you want to send data points every second and you want to keep the resolution of the incoming data points, don’t use MTS aggregation.
Potential issues 🔗
The difference between the timestamp that your systems add to a raw data point when it’s created and the time the system uses when it aggregates data points can cause one of the following issues:
The starting and ending time of aggregated MTS might shift. A data point generated by your server might come in some time after its creation time as recorded in its timestamp. In this case, the entire aggregated MTS shifts to a more recent time on the chart, indicating that the start time was more recent than the actual timestamp. This shift occurs because metrics pipeline management ignores the data point timestamp and instead uses the time it ingested the data point.
For example, if your data points have a 10:00 timestamp, but Observability Cloud doesn’t start receiving them until 10:10, the aggregated MTS seems to start at 10:10 instead of 10:00.
The aggregated MTS might appear to have an incorrect duration.
Avoid these aggregation issues by using the following options:
Do your own MTS aggregation before sending your data by reconfiguring the OTel collector to drop unwanted dimensions.
Aggregate data using SignalFlow when you generate charts or create detectors.
Scenario for metrics pipeline management 🔗
Create your first MPM rules 🔗
To start using metrics pipeline management, see Control your metric ingestion volume with rules.
Metrics pipeline management is not available for metrics ingested through the