Docs » Scenarios for troubleshooting errors and monitoring application performance using Splunk APM » Scenario: Kai monitors detector service latency for a group of customers

Scenario: Kai monitors detector service latency for a group of customers 🔗

Kai, a site reliability engineer at the fictitious Buttercup Games, wants to monitor a latency issue affecting a critical checkout workflow for the cart service and /getcart endpoint for a specific set of customers who most frequently have problems with the service.

Kai takes the following steps to monitor latency in the cart service:

  1. Kai generates a Monitoring MetricSet (MMS) and filters by span tag

  2. Kai creates service latency detectors to track metrics

  3. Kai sets up charts, dashboards, and alerts for custom dimensions

Kai generates a Monitoring MetricSet (MMS) and filters by span tag 🔗

To generate Monitoring MetricSets (MMS) by customer, Kai indexes a span tag to identify each customer: version_id. Kai then generates an MMS using version_id as a dimension. Kai sets the scope of the MMS to the cartservice, and filters on the tag values for version_id that represent the specific list of customers Kai wants to investigate.

This image shows an example MMS configuration for the cartservice endpoint /getcart and a filter by tag values for version_id:

This screenshot shows how to add a custom Monitoring MetricSet for a single service.

Kai creates service latency detectors to track metrics 🔗

Kai uses the custom dimensionalized MMS they created to monitor the performance of this critical checkout workflow in the cart service. To do this, Kai creates a detector using the same custom indexed tag, version_id, to track error rates associated with the checkout workflow.

Kai follows the guided setup detector creation to create their detector based on the error rate in the service cartservice:GetCart, filtered to the custom dimension of version_id.

Kai uses the metric finder to find additional information on the metrics and metadata for their system. Kai applies sf_dimensionalized:true as a filter to see related metrics as shown in the following image.

This screenshot shows how to filter the MetricFinder for metrics related to custom MMS.

Kai sets up charts, dashboards, and alerts for custom dimensions 🔗

Kai also creates charts and dashboards that use the custom dimensions they created.

This screenshot shows how to filter the MetricFinder for metrics related to custom Monitoring MetricSets.

Summary 🔗

By generating an MMS with version_id as a custom dimension and filtering it to the customers affected by the issue, Kai set up a detector to monitor service and endpoint latency by customer. Kai also created charts and dashboards that show service and endpoint latency for specific customers over time.

Learn more 🔗