Gain insights through chart analytics 🔗

Splunk Infrastructure Monitoring analytics can change a chart that is displaying raw metric data into a powerful tool that gives you a deeper understanding of patterns and trends, so you can more effectively monitor infrastructure, application or service health. In this section, we provide instructions for how to do the following.

Compare your aggregate utilization levels by service through a group‑by
Retain peaks and valleys in longer time ranges
Correlate multiple metrics by viewing them on the same chart
Compare current values with hourly, daily, weekly or other historical patterns
Use the Timeshift function to understand trends towards failure
See percentages or ratios via time series expressions
Use percentiles to see population overviews
Show Top or Bottom N lists to find simple outliers or rankings
See changes in distribution through the use of histograms
Smooth out your data to see general patterns rather than focus on temporary peaks or valleys

This section assumes you are familiar with the following topics.

Compare aggregates by service or other metadata 🔗

When you are looking at infrastructure metrics for a good-sized fleet of hosts, virtual machines or containers, it is often more instructive to look at them at an aggregate level and compare the aggregates than to look at individual instances. Many of the analytics functions allow you to group the output by metadata, which serves this purpose perfectly.

Select the metric you want to compare at an aggregate level (e.g. across services) and enter its name in the Signal field for plot A. In this example, we are plotting demo.trans.latency.

This screenshot shows how to select the metric you want to compare at an aggregate level and use

In the Analytics field, select the function you want to apply, such as mean:aggregation. The chart now displays a single plot line displaying the mean value of the aggregation across all time series in each time interval.

This screenshot shows how to select the function you want to apply and use

Click on the selected function for the plot. Click the group‑by dropdown. Select the metadata you want to group by, such as service (if you are sending in a dimension named “service”), aws_availability_zone (if you are using AWS) or other metadata. In this example, we chose demo_datacenter.

This screenshot shows how to select the metadata you want to group by and use

Now you can see the metric aggregated across all resources (hosts/vm/container) in each sub-group. As the data table shows, each plot line represents one of the two demo_datacenters.

This screenshot shows an example of the analytics aggregated and grouped by the metadata you selected.

Retain peaks and valleys in longer time ranges 🔗

By default, Splunk Infrastructure Monitoring selects a rollup that is appropriate for the time range and chart resolution you have selected. For example, let’s assume you are sending a metric every 10 seconds to Infrastructure Monitoring, and that its metric type is gauge. If you are looking at a month’s worth of that metric in a chart, there are too many data points to display (6 data points per minute x 60 minutes per hour x 24 hours per day x 30 days per month = 259,200 data points).

In this situation, Infrastructure Monitoring applies the default visualization rollup of Average for a gauge metric. This rollup has the effect of averaging out the data, and makes peaks or valleys that are visible at the higher resolution less apparent.

This screenshot shows an example of the default visualization rollup of Average, the gauge metric

To retain the peaks or valleys, you can change the rollup to max or min, whichever is more relevant to your metric. The Y-axis value range may change from what it was in the original visualization. In this illustration, we clone plot A and change the rollup to max in plot B (and change the color in plot B to make the differences easier to see). To clone a plot line, open the plot’s Actions menu (⋯) at the far right of the plot line, then select Clone. For information on changing plot color, see Set options in the plot configuration panel.

This screenshot shows an example of changing the default visualization rollup of Average, the gauge metric, to rollup to max

To make peaks and valleys even more noticeable, increase the chart display resolution. Here, we change it from the default to Very High. The differences are more visible.

This screenshot shows an example of changing the chart display resolution to very high

Choosing a shorter time frame increases visibility as well. Here, we change the time range from the past 20 days to the past week.

This screenshot shows an example of changing the time range from the past 20 days to the past week

For more information about the interactions between rollups, chart resolution, and analytics, see Data resolution and rollups in charts.

Correlate multiple metrics 🔗

It is often useful to visualize multiple metrics on the same chart so as to more easily correlate their behavior. For example, you may want to look at the number of transactions happening per second alongside the latency of the transactions. Splunk Infrastructure Monitoring lets you display as many metrics as you want on a single chart, and gives you two Y-axes in case the ranges of the metrics’ values are significantly different.

Select the metric you want to compare and enter its name in the Signal field for plot A. In this example, we are using demo.trans.latency.
Select the second metric and use it in plot B. We’ve selected demo.trans.count.

This screenshot shows an example of using demo.trans.latency and demo.trans.count for comparing correlations

In plot B, click Y-Axis and select right. To learn more, see Left and right Y-axes.

This screenshot shows how to change the Y-Axis label to right

Using the visualization type option for each plot line, select different types for A and B, such as Line for A and Column for B. To learn more, see Visualization type. In this example, we also used plot configuration options to change the color of plot line B to enhance visibility. To learn more, see Plot color.

This screenshot shows how to change the plot type of B, demo.trans.count, to column to enhance visibility

View weekly, daily or hourly comparisons 🔗

If time of day or week matters for understanding whether your apps or infrastructure are performing within normal bounds, or if your business sees cyclical or periodic demand, e.g. weekdays and weekends are very different, then you can create charts that highlight the change from one week, one day, one hour etc. to the next. (Note that Splunk Infrastructure Monitoring allows you to do comparisons using whatever timeframe you want, not just these intervals.)

Use the first plot (plot A) to show the metric you care about, then clone A to create plot B. (To clone a plot line, open the plot’s Actions menu (⋯) at the far right of the plot line, then select Clone.) In this example, we are using memory.usage.total as our signal.
Add a Timeshift function to plot B, entering a time range over which the change matters, For example, use 5m for 5 minutes, 2d for 2 days, and 1w for 1 week.

This screenshot shows how to select timeshift as an function

This screenshot shows how the one week time range over the change matters, which is memory.usage.total in the example

In plot C, click on Enter Formula to enter A-B to see the difference between now and a week ago.
Use the plot configuration panel to specify an area visualization for plot C. To learn more, see Set options in the plot configuration panel.

This screenshot shows how to change the visualization for plot C to compare the differences between A and B

Use the Timeshift function to understand trends 🔗

In infrastructure and application monitoring, the trend of a metric (the rate at which it is changing) is frequently of greater interest than the absolute value of the metric itself. For example, it might not be meaningful to know that your CPU is 70% utilized, but you might care to know that the utilization has doubled consistently for the past 10 minutes, as that might indicate that the system is trending towards failure.

Use the first plot (plot A) to show the metric you care about (we used the mean for cpu.utilization), then clone A to create plot B. (To clone a plot line, open the plot’s Actions menu (⋯) at the far right of the plot line, then select Clone).
Add a Timeshift function to plot B, entering a time range over which the change matters, e.g. 5m for 5 minutes.

This screenshot shows how to select timeshift as an function and use cpu.utilizations as an example

This screenshot shows how the 5 minutes time range over the change matters, which is cpu.utilizations in the example

In plot C, enter the formula (A/B-1) and add a scale:100 function to express the rate of change as a percentage.
Alt-click or option-click on the eye icon next to plot C to display only that plot, which shows you the percentage change over your disk utilization from 5 minutes prior.

This screenshot shows how to only display plot C, which is (A/B-1)

Edit the plot name for plot C, so useful information shows up when you hover over the chart or view the data table.

This screenshot shows how to change the name of the plot for adding useful information when hover over the chart

Use percentages or ratios 🔗

In many cases, you may want to see percentages or ratios rather than the raw metric. For example, the ratio of return codes that signify failure to those that signify success, or the percentage of cache hits out of total cache accesses (hits + misses).

Use the first plot (plot A) to show one of the metrics you care about, e.g. zipper.missCount.
Use the second plot (plot B) to show the other metric you want, e.g. zipper.hitCount.

This screenshot shows plot A as zipper.missCount and plot B as zipper.hitCount

In plot C, enter formula A/(A+B) and add a scale:100 function to express the ratio as a percentage.

This screenshot shows how to add a formulate and scale to show percentage

Alt-click or option-click on the eye icon next to plot C to hide the other plots. You are left with a chart that shows the percentage of missed hits over time.

This screenshot shows how to only display plot C, which is A/(A+B)

Edit the plot name for plot C, so useful information shows up when you hover over the chart (before and after shown below) or view the data table.

Use percentiles to see population overviews 🔗

When you want to get a quick overview of a population, a distributed percentile chart is a good option. To construct such a chart, use non-stacked area charts. Select Show on-chart legend in the Chart Options tab (see Show on-chart legend), then show the plots like the following.

p10. In the first plot (plot A), enter the metric and filters you want, then use the Percentile function and enter 10 as the value.
median. Clone plot A and use 50 as the value.
p90. Clone plot B and use 90 as the value.

This illustration shows what such a chart might look like:

This screenshot shows the percentiles of three plots, which are demo.trans.latency in the example

To see specific values, hover over different points on the chart or display the data table.

Show Top or Bottom N lists 🔗

Top or bottom N charts are great for showing simple outliers, rankings or worst performers.

Enter a metric for plot A. We chose cpu.utilization.
Select List as your chart type.
Apply the analytics function Top or Bottom, then choose either the number of values you want to see in the list or the percentage range you want to see. In this example, we chose Top 5 and specified Count.

This screenshot shows top 5 of cpu.utilization in a list chart

To reduce redundant metadata on the chart, select custom under the Display Fields option in the Chart Options tab to hide the plot name.
Sort Top N charts by Descending value, or Bottom N by Ascending value.

This screenshot shows a descending view of top 5 of cpu.utilization in a list chart

To make the chart even easier to read, use the Display Fields option to hide more fields. You can also hide Entries with missing data under the Visualization Options.

This screenshot shows the view of top 5 of cpu.utilization in a list chart that hides entries with missing data and fields except host.name, host.type and kubernetes_cluster

See changes in distribution 🔗

A histogram is a good way to look at the distribution of a population at a single point in time. Splunk Infrastructure Monitoring provides histograms so you can look at the change in that distribution over time. This is useful for surfacing unexpected changes, e.g. in the latencies of requests served by a cluster.

Select a metric that is being sent from a relatively large number of sources. In this case, we chose demo.trans.latency.

This screenshot shows demo.trans.latency in a line chart view

Choose the histogram graph type.

This screenshot shows demo.trans.latency in a histogram view

Smooth out peaks and valleys 🔗

Do you want to smooth out peaks and valleys in your data, to see general patterns from one period to the next? If you can’t tell at a glance if a value is generally steady, rising, or falling, you want to see data normalized in a moving average format, from one time period to the next. To do this, use the Transformation option instead of Aggregation. The Transformation option is available with the following analytics functions: Mean, Minimum / Maximum, Percentile, Sum, and Variance. For Mean, Minimum, Maximum, and Sum, you can specify either a moving window (the past number of minutes, hours, etc.) or a calendar time window (over the past day, week, month, etc.)

Determine an appropriate interval for applying a moving average.
Use the Mean analytics function, select the Mean:Transformation option, then select the appropriate time window option.
Enter your interval, e.g. 5m.

In the following illustration, values and moving averages are displayed for cpu.utilization as follows:

Plot A: Actual values
Plot B: 30-minute moving average
Plot C: 1-hour moving average

This screenshot shows an example of moving averages are displayed for cpu.utilization by all, 30-minute, and 1-hour

You can also hide plot lines to make the chart easier to read:

This screenshot shows an example of moving averages are displayed for cpu.utilization by all and 1-hour with 5-minute and 30-minute being hidden

Next steps 🔗

For details about all available analytics functions, see the Analytics reference for Splunk Observability Cloud.

Once you have developed charts to help you proactively monitor your system, the natural next step is to want to view and receive alerts when values reach certain criteria. For information on how to do this, see Introduction to alerts and detectors in Splunk Observability Cloud.

This page was last updated on Jan 03, 2025.

Related Topics