Docs » Introduction to service level objective (SLO) management in Splunk Observability Cloud » Measure and track your service health metrics with service level objectives (SLOs)

Measure and track your service health metrics with service level objectives (SLOs) ๐Ÿ”—

For each service that you use to indicate system health in Splunk Observability Cloud, you can define an SLO and how to measure it.

Create an SLO ๐Ÿ”—

Follow these steps to create an SLO.

  1. From the landing page of Splunk Observability Cloud, go to Detectors & SLOs.

  2. Select the SLOs tab.

  3. Select Create SLO.

  4. Configure the service level indicator (SLI) for your SLO. You can use a service or any metric of your choice as the system health indicator.

    To use a service as the system health indicator for your SLI configuration, follow these steps:

    Field name

    Actions

    Metric type

    Select Service & endpoint from the dropdown menu

    Environment

    Open the dropdown menu and check the boxes for the environments where you want to apply this SLO

    Service:endpoint

    • Search for the service you want to create an SLO for

    • (Optional) Add an endpoint for the selected service

    Indicator type

    Select either success rate or latency to use as the measurement for your SLO target:

    • Request success: Measure the proportion of requests that result in a successful response over the duration of the compliance window

    • Request latency: Measure the proportion of requests that load within the specified latency over the duration of the compliance window

    Filters

    Enter any additional dimension names and values you want to apply this SLO to. Alternatively, use the NOT filter, represented by an exclamation point ( ! ), to exclude any dimension values from this SLO configuration.

    To use a metric of your choice as the system health indicator for your SLI configuration, follow these steps:

    1. For the Metric type field, select Custom metric from the dropdown menu. The SignalFlow editor appears.

    2. In the SignalFlow editor, you can see the following code sample:

      G = data('good.metric', filter=filter('sf_error', 'false'))
      T = data('total.metric')
      
      • Line 1 defines G as a data stream of good.metric metric time series (MTS). The SignalFlow filter() function queries for a collection of MTS with value false for the sf_error dimension. The filter distinguishes successful requests from total requests, making G the good events variable.

      • Line 2 defines T as a data stream total.metric MTS. T is the total events variable.

      Replace the code sample with your own SignalFlow program. You can define good events and total events variables using any metric and supported SignalFlow function. For more information, see Analyze data using SignalFlow in the Splunk Observability Cloud Developer Guide.

    3. Select appropriate variable names for the Good events (numerator) and Total events (denominator) dropdown menus.

    Note

    Custom metric SLO works by calculating the percentage of successful requests over a given compliance period. This calculation works better for counter and histogram metrics than for gauge metrics. Gauge metrics are not suitable for custom metric SLO, so you might get confusing data when selecting gauge metrics in your configuration.

  5. Define your SLO and how to measure it.

    Field name

    Actions

    Target (%)

    Enter the target you want to set for this SLO.

    Latency (ms)

    Only available and required for request latency SLI type. Enter the target loading time for your service requests.

    Compliance window

    Select a compliance window for this SLO from the dropdown menu.

  6. Set up alerting for your SLO. You can subscribe to get notifications for the following alerts.

    Alert

    Description

    Breach event

    Alerts when the service level indicator (SLI) doesnโ€™t meet the target over the specified compliance window.
    Note: Breach event alerting is selected by default and always runs in the background.

    Error budget

    Alerts when the remaining error budget is less than 10% of the estimated error budget for the compliance window.

    Burn rate

    Alerts when the rate of consumption of your SLO error budget exceeds a healthy threshold for the specified compliance window. To learn more, see Burn rate alerts.

  7. Splunk Observability Cloud automatically generates a name for your SLO. You can change this auto-generated name, as long as the SLO name is unique.

  8. Select Create to create the SLO.

This page was last updated on Oct 01, 2024.