Set up alerts for Edge Processor metrics
As an Edge Processor administrator, you can set up alerts that trigger when Edge Processor metrics meet a certain criteria so that you can monitor the health and status of your Edge Processors. You can then take action to troubleshoot any potential issues with your Edge Processors. You can do this from your Splunk Cloud Platform deployment in your cloud tenant for use in Edge Processors.
This table highlights the search queries that you can use to set up alerts for Edge Processor metrics as well as some potential action items you can take once that situation occurs. You can create these queries and alerts by utilizing Splunk Cloud Platform functionality. For more information on how to configure alerts in Splunk Cloud Platform, see Getting started with alerts in the Splunk Cloud Platform Alerting Manual.
Metrics | Alert trigger conditions | Example search | Action item |
---|---|---|---|
Edge Processor queue size | If queue size is above a certain threshold. For example, 70%. This indicates that you need to increase your queue size. | SPL query to see latest queue size for each instance: | mstats latest(exporter_queue_size) as current_queue_size where index=_metrics by exporter
|
Increase your queue size to process more data. See these topics for more information: |
Destination connection | If the Edge Processor fails to connect to a destination. This indicates that your destination configuration might be incorrect or the destination might be offline. | SPL query to see connectivity failures per dataset: | mstats sum(egress_heartbeat_error_total) as heartbeat_failures_total where index=_metrics by dataset_name
|
Verify that the destination information is correct for Edge Processors by checking the edge.log file. See View logs for the Edge Processor solution for more information. |
Destination data send failure | If the Edge Processor fails to send data to a destination, creates errors, and those errors are above a certain threshold. This indicates that your destination configuration might be incorrect or the destination might be offline. | SPL query to see total send errors per dataset: | mstats sum(write_to_sink_errors_total) as export_failures_total where index=_metrics by dataset_name
|
Verify that the destination information is correct for Edge Processors by checking the edge.log file. See View logs for the Edge Processor solution for more information. |
CPU usage | If your host resource has an idle CPU usage above a certain threshold. This indicates that the host CPU can't handle the required workload. | SPL query to see the CPU usage by state for each host: | mstats sum(system.cpu.time) where index=_metrics by host,state
|
Verify what is causing a high CPU usage and take action accordingly. Increase CPU specifications or create an additional host to manage traffic. See An Edge Processor instance is in the "Warning" status for more information. |
Memory usage | If your host resource has a memory usage above a certain threshold. This indicates that the host memory can't handle the required workload. | SPL query to see memory usage in bytes per host: | mstats latest(system.memory.usage) where index=_metrics by host
|
Verify what is causing a high memory usage and take action accordingly, such as by increasing memory specifications. See An Edge Processor instance is in the "Warning" status for more information. |
View logs for the Edge Processor solution | Troubleshoot the Edge Processor solution |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.0.2209, 9.0.2303, 9.0.2305, 9.1.2308, 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)
Feedback submitted, thanks!