Internal metrics of the Collector 🔗
Find the complete list of the Collector’s internal metrics and what to use them for.
Use internal metrics to monitor your Collector instance 🔗
You can use the Collector’s internal metrics to monitor the behavior of the Collector and identify performance issues.
Monitor data flow and detect data loss 🔗
To ensure data is flowing correctly, use the otelcol_receiver_accepted_spans
, otelcol_receiver_accepted_metric_points
, and otelcol_receiver_accepted_logs
metrics for information about the data ingested by the Collector, and otecol_exporter_sent_spans
, otelcol_exporter_sent_metric_points
, and otelcol_exporter_sent_logs
for information about exported data.
Use otelcol_processor_dropped_spans
, otelcol_processor_dropped_metric_points
, and otelcol_processor_dropped_logs
to detect data loss. Small losses shouldn’t be considered outages, so depending on your requirements, set up a minimal time window before alerting.
Detect receive failures 🔗
Sustained rates of otelcol_receiver_refused_spans
, otelcol_receiver_refused_metric_points
, and otelcol_receiver_refused_logs
indicate too many errors returned to clients. Depending on the deployment and the client’s resilience this may indicate data loss at the clients.
Sustained rates of otelcol_exporter_send_failed_spans
, otelcol_exporter_send_failed_metric_points
, and otelcol_exporter_send_failed_logs
indicate that the Collector is not able to export data as expected. It doesn’t necessarily imply data loss since there could be retries but a high rate of failures could indicate issues with the network or the back-end receiving the data.
Control queue length 🔗
Use the queue-retry mechanism (available in most exporters) as the retry mechanism for the Collector:
To check if your queue capacity is enough, compare otelcol_exporter_queue_capacity
, which indicates the capacity of the retry queue in batches, and otelcol_exporter_queue_size
, which indicates the current size of retry queue.
otelcol_exporter_enqueue_failed_spans
, otelcol_exporter_enqueue_failed_metric_points
and otelcol_exporter_enqueue_failed_log_records
indicate the number of span/metric points/log records failed to be added to the sending queue. If your queue is full, decrease your sending rate or horizontally scale collectors.
The queue-retry mechanism also supports logging for monitoring. Check your logs for messages like “Dropping data because sending_queue is full”.
List of internal metrics of the Collector 🔗
These are the Collector’s internal metrics.
Metric name |
Metric description |
---|---|
|
Number of log records failed to be added to the sending queue |
|
Number of metric points failed to be added to the sending queue |
|
Number of spans failed to be added to the sending queue |
|
Capacity of the exporter queue |
|
Current size of the retry queue, in batches |
|
Number of log records failed to be sent to destination |
|
Number of metrics point failed to be sent to destination |
|
Number of log records successfully sent to destination |
|
Number of metric points successfully sent to destination |
|
Number of spans successfully sent to destination |
|
Number of namespace add events received |
|
Number of namespace update events received |
|
Number of pod add events received |
|
Number of pod delete events received |
|
Size of table containing pod info |
|
Total CPU user and system time, in seconds |
|
Total physical memory (resident set size) |
|
Bytes of allocated heap objects |
|
Total bytes of allocated objects |
|
Cumulative bytes allocated for heap objects |
|
Uptime of the process |
|
Number of log records successfully pushed into the next component in the pipeline |
|
Number of metric points successfully pushed into the next component in the pipeline |
|
Number of spans successfully pushed into the next component in the pipeline |
|
Number of units in the batch |
|
Number of units in the batch histogram bucket |
|
Number of units in the batch histogram count |
|
Number of units in the batch histogram sum |
|
Number of times the batch was sent due to a timeout trigger |
|
Number of log records that were dropped |
|
Number of metric points that were dropped |
|
Number of spans that were dropped |
|
Distribution of groups extracted for logs |
|
Distribution of groups extracted for logs bucket histogram |
|
Distribution of groups extracted for logs count histogram |
|
Distribution of groups extracted for logs sum histogram |
|
Number of refused log records |
|
Number of refused metric points |
|
Number of refused spans |
|
Number of log records successfully pushed into the pipeline |
|
Number of metric points successfully pushed into the pipeline |
|
Number of spans successfully pushed into the pipeline |
|
Number of log records that could not be pushed into the pipeline |
|
Number of metric points that could not be pushed into the pipeline |
|
Number of spans that could not be pushed into the pipeline |
|
Number of metric points that couldn’t be scraped |
|
Number of metric points successfully scraped |