Docs » Scenarios for troubleshooting errors and monitoring application performance using Splunk APM » Scenario: Alex troubleshoots slow traces using Trace Analyzer

Scenario: Alex troubleshoots slow traces using Trace Analyzer πŸ”—

Alex, the site reliability engineer for Buttercup Games, receives a report of a few customers who experienced slowness using Buttercup Games. To proactively improve the customer experience, Alex uses Trace Analyzer to determine how pervasive the slowness is.

These are the steps Alex takes to determine how pervasive the slowness is:

  1. Alex uses the trace duration view in Trace Analyzer and filters the time range

  2. Alex zooms in on the trace duration heatmap

  3. Alex turns off sampling

  4. Alex reviews the heatmap for a high rate of high-duration traces

  5. Alex sorts the table of traces to review high-duration traces

Alex uses the trace duration view in Trace Analyzer and filters the time range πŸ”—

Customer support shared that the customer reports of slowness started around 11:00 AM. So, Alex selects the trace duration view in Trace Analyzer and filters to the time range that matches the customer’s report.

This gif shows the trace duration selection and the time selection in the Trace Analyzer chart

Alex zooms in on the trace duration heatmap πŸ”—

Alex selects the time period in the trace duration heatmap that shows a higher rate of traces with longer trace durations to further refine the traces in the table.

This gif shows the filtering to a selection in the Trace Analyzer heatmap

Alex turns off sampling πŸ”—

Alex selects 1:1 for the Sample Ratio so that they can view all traces that match their criteria.

This gif shows the sampling ratio selection in the Trace Analyzer chart

Alex reviews the heatmap for a high rate of high-duration traces πŸ”—

Alex uses the heatmap to better understand trace durations for the time period reported by the customer. Alex notes the darker area of the heatmap at 11:10 AM which tells them that there was a high trace per second rate (between 3 and 4 traces per second) with durations of 10 or more seconds.

This screenshot shows the heatmap for 11:10 AM which shows 3-4 traces per second had durations of 10 or more seconds

Alex sorts the table of traces to review high-duration traces πŸ”—

Alex sorts the table of traces by duration so that they can review the high-duration traces.

This gif shows sorting the trace table by duration

Summary πŸ”—

Using the high-resolution data provided by full-fidelity tracing, Alex managed to quickly determine the prevalence of slowness. Using filtering and the trace duration heatmap, Alex quickly isolated high-duration traces to provide to the engineers to isolate the issue.

Learn more πŸ”—