Docs » Drill down to identify problems » Identify problem areas using log aggregation

Identify problem areas using log aggregation 🔗

Aggregations group related data by one field and then perform a statistical calculation on other fields. Aggregating log records helps you visualize problems by showing averages, sums, and other statistics for related logs.

For example, suppose that you’re browsing the Raw Logs Table to learn more about the performance of your services. If you’re concerned about the response time of each service, you can group log records by service URL and calculate average response time using an aggregation. This aggregation helps you identify services that are responding slowly.

After you identify services with poor response time, you can drill down in the log records for that service to understand the problems in more detail.

Aggregate log records 🔗

To perform an aggregation, follow these steps:

  1. Find the aggregations controls in the control bar. The default aggregation is a count of all log records grouped by the value of the severity field. This default corresponds to the following aggregation controls settings:

    • COUNT

    • All(*)

    • Group by: severity

    The default shows a count of all log events grouped by severity.

  2. To change the field to group by, type the field name in the Group by text box and press Enter. The control also has these features:

    • When you click in the text box, Log Observer displays a drop-down list containing all the fields available in the log records.

    • The text box does auto-search. To find a field, start typing its name.

    • To select a field in the list, click its name.

  3. To change the calculation you want to apply to each group, follow these steps:

    1. Select the type of statistic from the calculation control. For example, to calculate a mean value, select AVG.

    2. Choose the field for the statistic by typing its name in the calculation field control text box. The text box does auto-search, so start typing to find matching field names.

  4. To perform the aggregation, click Apply.

Example 1: Identify problems by aggregating severity by service name 🔗

One way you can discover potential problems is to find services that are generating a high number of severe errors. To find these services, group log records by service name and count all the records. Services with problems appear as groups with many records that have a severity value of ERROR.

To apply this aggregation, follow these steps:

  1. Using the calculation control, set the calculation type by selecting COUNT.

  2. Using the calculation field control, set the calculation field to All(*).

  3. Using the Group by text box, set the field to group by to service.name.

  4. Click Apply. The Timeline histogram displays a count of logs by all your services as stacked columns, in which each severity value has a different color. The histogram legend identifies the color of each severity.

The following screenshot shows you an example of this aggregation:

Example of a Log Observer aggregation

Example 2: Identify problems by aggregating response time by request path 🔗

Longer than expected service response might indicate a problem with the service or other part of the host on which it runs. To identify services that are responding more slowly than expected, group log events by http.req.path, a field that uniquely identifies each service. For each group, calculate the mean of the response time field http.resp.took_ms.

To apply this aggregation, follow these steps:

  1. Using the calculation control, set calculation type to AVG.

  2. Using the calculation field control, set the field to http.resp.took_ms

  3. Using the Group by text box, set the field to group by to http.req.path.

  4. Click Apply. The Timeline histogram displays the average response time for each service.

The following screenshot shows you an example of this aggregation:

Example of a Log Observer aggregation