Analyze service performance with Tag Spotlight 🔗
Use Tag Spotlight to analyze the performance of your services to discover trends that contribute to high latency or error rates with indexed span tags. You can break down every indexed span tag for a particular service to view metrics for it. When you select specific span tag values or a specific time range, you can view representative traces to learn more about an outlying incident.
For every service, Tag Spotlight provides a RED metrics time-series chart that displays the total number of requests, errors, root-cause errors, and latency according to the specified time range in the APM navigation menu. Along with the RED metrics chart, Tag Spotlight also displays the total number of requests, errors, root-cause errors, and latency for every value of an indexed span tag according to the specified time range in the APM navigation menu.
Splunk APM uses Troubleshooting MetricSets to display indexed span tag performance for a service. For every indexed span tag value, view metrics for request rate, error, root-cause error rates, and p50, p90, and p99 latency. For more information about Troubleshooting MetricSets, see Index span tags to gain insight into service performance.
Example: Find the root cause of an incident with Tag Spotlight 🔗
yourService is generating a lot of errors. Follow these steps to learn how you can pinpoint the root cause of an incident with Tag Spotlight.
For this example, you index span tags representing these things:
Kubernetes pod name
This example also uses the Operation span tag, but this is indexed by default.
In the application, go to the APM page.
From the Troubleshooting tab, select a service you want to drill down into.
Click Tag Spotlight in the Requests and Errors service card.
Using the RED metrics chart, click and drag the cursor where there’s a spike in errors to view data for only the incident you’re investigating.
Operationspan tag card, you see that some operation
yourOperationhas a lot of errors.
Hover over the operation to quickly see RED metrics for the operation.
To drill down further into the performance of the operation, click the
yourOperationvalue in the span tag card. This shows you information about all indexed tags for only requests that include
There are multiple tenant values, but you see that users who belong to a single tenant are experiencing the vast majority of errors with the service.
You also see that a particular Kubernetes pod has an error spike that corresponds to the errors the operation is generating.
You infer that the incident is due to an operation running in a specific Kubernetes pod that affects people associated with a particular tenant.
From the RED metrics chart, click the peak error rate to view an exemplar trace for the incident.