Investigate the root cause of an error with Splunk APM service map 🔗
Kai, a site reliability engineer at Buttercup Games, receives tickets from multiple customers getting “Invalid request” errors when purchasing games on the Buttercup Games website. To figure out the most downstream service causing the error, Kai selects the Explore window in Splunk APM to open the service map for troubleshooting.
Kai looks through the real-time service map, which contains nodes and dependencies of services instrumented in Splunk APM. The service map identifies the root cause error rate using red color. Kai finds that the paymentservice node has a red dot, and the dependency arrow from the checkoutservice node and the paymentservice node is red.
Kai clicks the paymentservice node to discover the endpoint with the top error rate on the Tag Spotlight sidebar. Kai finds that all of the errors occur in one endpoint, as shown in the following screenshot:
Kai adds the link to the endpoint’s Tag Spotlight and a note identifying the endpoint as the root cause of the error to customers’ tickets. Kai sends the ticket to the payment service owner for further troubleshooting.
For information about how to instrument your applications to send application metrics and traces to Splunk Observability Cloud, see Collect application spans and traces.