Troubleshoot missing metrics đź”—
Note
See also:
The Splunk Collector for Kubernetes is missing metrics starting with k8s.pod.*
and k8s.node.*
đź”—
After deploying the Splunk Distribution of the OpenTelemetry Collector for Kubernetes Chart version 0.87.0 or higher as either a new install or upgrade the following pod and node metrics are not being collected:
k8s.(pod/node).cpu.time
k8s.(pod/node).cpu.utilization
k8s.(pod/node).filesystem.available
k8s.(pod/node).filesystem.capacity
k8s.(pod/node).filesystem.usage
k8s.(pod/node).memory.available
k8s.(pod/node).memory.major_page_faults
k8s.(pod/node).memory.page_faults
k8s.(pod/node).memory.rss
k8s.(pod/node).memory.usage
k8s.(pod/node).memory.working_set
k8s.(pod/node).network.errors
k8s.(pod/node).network.io
Confirm the metrics are missing đź”—
To confirm these metrics are missing perform the following steps:
Confirm that the metrics are missing with the following Splunk Search Processing Language (SPL) command:
| mstats count(_value) as "Val" where index="otel_metrics_0_93_3" AND metric_name IN (k8s.pod.*, k8s.node.*) by metric_name
Check the Collector’s pod logs from the CLI of the Kubernetes node with this command:
kubectl -n {namespace} logs {collector-agent-pod-name}
Note: Update namespace
and collector-agent-pod-name
based on your environment.
You will see a “tls: failed to verify certificate” error similar to the one below in the agent pod logs:
2024-02-28T01:11:24.614Z error scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get \"https://10.202.38.255:10250/stats/summary\": tls: failed to verify certificate: x509: cannot validate certificate for 10.202.38.255 because it doesn't contain any IP SANs", "scraper": "kubeletstats"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
go.opentelemetry.io/collector/receiver@v0.93.0/scraperhelper/scrapercontroller.go:200
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
go.opentelemetry.io/collector/receiver@v0.93.0/scraperhelper/scrapercontroller.go:176
Resolution đź”—
The Kubelet stats receiver collects k8s.(pod or node) metrics from the Kubernetes endpoint /stats/summary
. As of version 0.87.0 of the Splunk OTel Collector the kubelet certificate is verified during this process to confirm it’s valid. If you are using a self signed or invalid certificate the Kubelet stats receiver cannot collect the metrics.
You have two alternatives to resolve this error:
Add valid a certificate to your Kubernetes cluster. See how at Configure the Collector for Kubernetes with Helm. After updating the
values.yaml
file use the Helm upgrade command to upgrade your Collector deployment.Disable certificate verification in the OTel agent Kubelet Stats receiver by setting
insecure_skip_verify: true
for the Kubelet stats receiver in the agent.config section of the values.yaml.
For example, use the configuration below to disable certificate verification:
agent:
config:
receivers:
kubeletstats:
insecure_skip_verify: true
Caution
Keep in mind your security requirements before disabling certificate verification.