Docs » Scenarios for finding and resolving infrastructure problems using alerts and detectors » Scenario: Kai creates a detector to monitor server latency

Scenario: Kai creates a detector to monitor server latency ๐Ÿ”—

Kai, a site reliability engineer at Buttercup Games, receives many tickets from Buttercup Games customers experiencing high latency on game servers. Kai wants a reliable way to monitor their host machinesโ€™ server latency so they can quickly identify and solve high latency issues before customers experience them.

Using Splunk Observability Cloud, Kai can create a detector that alerts them whenever a serverโ€™s latency crosses a threshold for a period of time.

Define the data to use for alerting ๐Ÿ”—

Kai opens the Alerts & Detectors page in Splunk Observability Cloud and selects New Detector to create a detector from scratch.

After naming the detector, Kai chooses Infrastructure or Custom Metrics Alert Rule.

Kai selects their desired metric, latency, and sees a preview detector that reports on the metric:

This image shows a preview view of the metric that Kai's detector reports on.

Kai can apply analytics to change how the signal is reported. Kai wants to report on the average server latency over a 1-minute window, so Kai applies the Mean:Transformation analytic and enters a period of 1 minute.

The preview detector changes to reflect Kaiโ€™s applied analytic:

This screenshot shows a preview reflecting the average server latency of each machine over a period of 1 minute.

Choose an alert condition ๐Ÿ”—

Kai can choose between several options for an alert condition. Alert conditions determine the type of behavior that triggers an alert.

Kai chooses the Static threshold alert condition because they want to know when server latency exceeds a certain point for a certain duration of time. In other cases, Kai might want to choose a different alert condition. For example, Kai might choose the Sudden change condition if they want to be alerted when server latency rapidly increases.

Customize alert settings ๐Ÿ”—

In the Alert Setting menu, Kai enters desired values for the following fields:

Field

Value

Description

Threshold

280

The detector alerts when latency exceeds 280 milliseconds

Duration

1 minute

The detector alerts when latency exceeds 280 milliseconds for 1 minute or more

The detector preview shows red arrows on the timestamps when the detector triggers an alert:

This screenshot displays red arrows on timestamps where the alert is triggered.

Set up alert messages and recipients ๐Ÿ”—

After creating the alert condition, Kai selects Alert Message. Kai enters the runbook buttercupgames.com/alerts and adds an internal tip to check the memory load and disk usage on the server:

This screenshot displays the runbook and tip that Kai enters for the alert.

The runbook and tip allow Kai to quickly view their alerts and remind Kai what to do when an alert is triggered.

Kai then selects Alert Recipients and adds their email to the list of alert recipients. After adding their email, Kai activates the alert rule.

Summary ๐Ÿ”—

Kai has created a detector that sends them an alert whenever the average server latency over a 1-minute window exceeds a threshold of 280 milliseconds for 1 minute. This detector allows Kai to quickly detect and resolve server latency issues that they were previously unaware of.

Learn more ๐Ÿ”—

For more information on how to create a detector, see Create detectors to trigger alerts.

For more information on alert conditions and how to choose the right condition, see Built-in alert conditions.

This page was last updated on May 26, 2023.