Custom threshold đź”—
Custom Threshold lets you trigger an alert by comparing one signal to another or by evaluating multiple conditions. Use Custom Threshold if you want to create a detector that maps to one of the following patterns:
You want to see an alert if one signal meets a condition based on the value of another signal
You want to specify compound conditions using AND and OR operators, based on the value of one signal
You want to specify compound conditions using AND and OR operators, based on the values of multiple signals
Compound conditions đź”—
When you are on the Alert Settings tab, you can select Add another condition to create a compound condition using AND and OR operators. You can add a total of 10Â conditions.
When specifying compound conditions, AND conditions are applied before OR conditions. To ensure that the conditions are evaluated as required, you can select options from a condition’s Actions menu (⋯) to arrange them into the appropriate order. You can also remove a condition from a condition’s Actions menu (⋯).
Note
For a compound condition to trigger an alert, all the values involved in the condition must be non-null.
If you need to build more complex conditions than this alert condition supports, such as “a AND (b OR c) AND d”, or “a AND NOT b”, you can do so by using the Splunk Observability Cloud API to create the detector.
Examples: Single condition, comparing signals đź”—
You want to receive an alert when the number of
cache-misses
is higher than the number ofcache-hits
for 1Â minute. In this case, usecache-misses
as the signal to monitor,Above
as the option for Alert when,cache-hits
as the threshold, and aDuration
of 1 minute as the option for Trigger sensitivity.You have 3 signals, each of which measures maximum latency for a single AWS availability zone. You have had problems with one of the zones in the past, and you want to receive an alert when that signal is outside the range of the other 2 signals. In this case, use the troublesome zone as the signal to monitor,
Out of range
as the option for Alert when, and the other two signals for the lower and upper thresholds.
Example: Compound conditions, monitoring a single signal đź”—
The following example shows how you might build compound conditions while monitoring a single signal.
You have 2 signals (A and B); A measures available memory in prod and B measures available memory in lab. You want to receive an alert:
if available memory in prod is lower than available memory in lab, or
if available memory in prod is less than 50%
In this case, monitor a single signal (A, available memory in prod) and then set the following conditions:
alert when signal A is less than B OR
signal A is less than 50
Examples: Compound conditions, monitoring multiple signals đź”—
The following examples show how you might build compound conditions while monitoring multiple signals.
You have 3 signals (A, B, and C), each of which measures available memory in a particular environment (prod, lab, or dev respectively). You want to receive an alert:
if available memory in prod is less than 70%, or
if available memory in lab and in dev are less than 70%
In this case, monitor multiple signals and then set the following conditions:
alert when signal A is less than 70 OR
signal B is less than 70 AND
signal C is less than 70
Note
AND conditions are always evaluated before OR conditions.
In your organization, one group is responsible for monitoring the health of a cluster while another monitors the health of individual nodes. You don’t want to trigger alerts for individual nodes when the cluster itself is unhealthy.
Assuming A is a metric for cluster health and B is a metric for node health, you can create two detectors:
One detector monitors signal A and triggers alerts when A is unhealthy.
Another detectors monitors multiple signals, and has the following conditions:
alert when A is healthy AND
B is unhealthy
Settings đź”—
Parameter |
Values |
Notes |
---|---|---|
Alert when |
|
none |
Threshold, Lower threshold, Upper threshold |
|
|
Trigger sensitivity |
|
|
Duration |
Integer >= 1, followed by time indicator (s, m, h, d, w), e.g. 30s, 10m, 2h, 5d, 1w |
The amount of time the signal must meet the threshold condition. Longer time periods result in lower sensitivity and potentially fewer alerts. |
Percent of duration |
Percentage: Integer between 1 and 100; Duration: Integer >= 1, followed by time indicator (s, m, h, d, w), e.g. 30s, 10m, 2h, 5d, 1w |
The percentage of times the threshold was met during the specified duration. |
Duration to trigger an alert đź”—
As you might expect, choosing Immediately
for Trigger Sensitivity means that an alert is triggered as soon as the signal meets the threshold. This option is the most sensitive (might trigger the most alerts) of the three trigger sensitivity options.
Depending on the nature of your signal, triggering alerts immediately can lead to flappiness. In these cases, you can choose one of the other options, Duration or Percent of duration.
The Duration
option triggers when the signal meets and remains at threshold condition for a specified period, such as 10 minutes. Therefore, using this option is less sensitive (might trigger fewer alerts) than the Immediately
option. If you use this option, an alert isn’t triggered if any data points are delayed or do not arrive at all during that time range, even if all the data points that are received do meet the threshold. For more information about delayed or missing data points, see Handle delayed or missing data points.
If you want an option that triggers even if some data points do not arrive on time, use Percent of duration
(with a percentage less than 100).
The Percent of duration
option triggers alerts based on the number of data points that met the threshold during the window, compared to how many data points were expected to arrive. Because this option triggers an alert based on the percentage of data points that met the threshold, it can sometimes trigger an alert even if some data points didn’t arrive on time. Therefore, using this option with a percentage less than 100 is more sensitive (might trigger more alerts) than the Duration
option.
The following examples illustrate how alerts are triggered in various situations.
Example 1 đź”—
Option you specify for Trigger Sensitivity: Duration = 3 minutes
Resolution of the signal: 5 seconds
Number of data points expected in 3 minutes: 12 per minute * 3 minutes (36)
Number of anomalous data points (how many times the threshold must be met) to trigger alert: 36
Total data points expected
Total data points received
Anomalous data points required
Anomalous data points received
Alert is triggered?
36
36
36
36
Yes
36
36
36
35 or fewer
No
36
35
36
35 or fewer
No
Example 2 đź”—
Option you specify for Trigger Sensitivity: Percent of Duration = 75% of 3Â minutes
Resolution of the signal: 5 seconds
Number of data points expected in 3 minutes: 12 per minute * 3 minutes (36)
Number of anomalous data points (how many times the threshold must be met) to trigger alert: 75% of 36 (27)
Total data points expected
Total data points received
Anomalous data points required
Anomalous data points received
Alert is triggered?
36
36
27
27-36
Yes
36
30
27
27-30
Yes
36
30
27
26 or fewer
No
Note that in the last example above, even if 26 anomalous data points arrive, and 26/30 is greater than the 75% you specified, the required number of anomalous data points (27) did not arrive. Therefore, the alert isn’t triggered. The percent you specify represents percent of expected data points, not percent of received data points.