Docs » Get started with the Splunk Distribution of the OpenTelemetry Collector » Get started with the Collector for Kubernetes » Advanced configuration for Kubernetes

Advanced configuration for Kubernetes πŸ”—

See the following advanced configuration options for the Collector for Kubernetes.

For basic Helm chart configuration, see Configure the Collector for Kubernetes with Helm. For log configuration, see Collect logs and events with the Collector for Kubernetes.

Note

The values.yaml file lists all supported configurable parameters for the Helm chart, along with a detailed explanation of each parameter. Review it to understand how to configure this chart.

The Helm chart can also be configured to support different use cases, such as trace sampling and sending data through a proxy server. See Examples of chart configuration for more information.

Override the default configuration πŸ”—

You can override the default configuration to use your own. To do this, include a custom configuration using the agent.config, clusterReceiver.config, or gateway.config parameter in the values.yaml file. Find examples at values.yaml , agent , cluster receiver , and gateway .

For example:

agent:
  config:
    processors:
      # Exclude logs from pods named 'podNameX'
      filter/exclude_logs_from_pod:
        logs:
          exclude:
            match_type: regexp
            resource_attributes:
              - key: k8s.pod.name
                value: '^(podNameX)$'
    # Define the logs pipeline with the default values as well as your new processor component
    service:
      pipelines:
        logs:
          processors:
            - memory_limiter
            - k8sattributes
            - filter/logs
            - batch
            - resourcedetection
            - resource
            - resource/logs
            - filter/exclude_logs_from_pod

This custom configuration is merged into the default agent configuration.

Caution

After merging the files you need to fully redefine parts of the configuration, for example service, pipelines, logs, and processors.

Configure control plane metrics πŸ”—

Control plane metrics are available for the following components: coredns, etcd, kube-controller-manager, kubernetes-apiserver, kubernetes-proxy, and kubernetes-scheduler. You can use the Collector Helm agent to obtain control plane metrics from a specific component by setting agent.controlPlaneMetrics.{otel_component} to true.

The Helm chart uses the Collector on each node to use the receiver creator to represent control plane receivers at runtime. The receiver creator has a set of discovery rules that know which control plane receivers to create. The default discovery rules can vary depending on the Kubernetes distribution and version. See Receiver creator receiver for more information.

If your control plane is using non-standard specifications, then you can provide a custom configuration to allow the Collector to successfully connect to it.

Supported versions πŸ”—

The Collector relies on pod-level network access to collect metrics from the control plane pods. Since most cloud Kubernetes as a service distributions don’t expose the control plane pods to the end user, collecting metrics from these distributions is not supported.

The following table shows which Kubernetes distributions support control plane metrics collection:

Supported

Unsupported

  • Kubernetes

  • OpenShift

  • AKS

  • EKS

  • EKS/Fargate

  • GKE

  • GKE/Autopilot

See the agent template for the default configurations for the control plane receivers.

Availability πŸ”—

The following components provide control plane metrics:

Use custom configurations for non-standard control plane components πŸ”—

You can override the default configuration values used to connect to the control plane. If your control plane uses nonstandard ports or custom TLS settings, you need to override the default configurations.

The following example shows how to connect to a nonstandard API server that uses port 3443 for metrics and custom TLS certs stored in the /etc/myapiserver/ directory.

agent:
  config:
    receivers:
      receiver_creator:
        receivers:
          # Template for overriding the discovery rule and configuration.
          # smartagent/{control_plane_receiver}:
          #   rule: {rule_value}
          #   config:
          #     {config_value}
          smartagent/kubernetes-apiserver:
            rule: type == "port" && port == 3443 && pod.labels["k8s-app"] == "kube-apiserver"
            config:
              clientCertPath: /etc/myapiserver/clients-ca.crt
              clientKeyPath: /etc/myapiserver/clients-ca.key
              skipVerify: true
              useHTTPS: true
              useServiceAccount: false

Activate Kubernetes control plane metrics with the Prometheus receiver πŸ”—

To activate control plane metrics with the OpenTelemetry Prometheus receiver instead, use the feature flag useControlPlaneMetricsHistogramData:

featureGates:
  useControlPlaneMetricsHistogramData: true

Note

Out-of-the-box dashboards and navigators for control plane metrics with the Prometheus receiver aren’t supported yet, but are planned for a future release.

To learn more see Prometheus receiver.

Known issues πŸ”—

There is a known limitation for the Kubernetes proxy control plane receiver. When using a Kubernetes cluster created using kops, a network connectivity issue prevents proxy metrics from being collected. The limitation can be addressed by updating the kubeProxy metric bind address in the kops cluster specification:

  1. Set kubeProxy.metricsBindAddress: 0.0.0.0 in the kops cluster specification.

  2. Run kops update cluster {cluster_name} and kops rolling-update cluster {cluster_name} to deploy the change.

Run the container in non-root user mode πŸ”—

Collecting logs often requires reading log files that are owned by the root user. By default, the container runs with securityContext.runAsUser = 0, which gives the root user permission to read those files.

To run the container in non-root user mode, use agent.securityContext to adjust log data permissions to match the securityContext configurations. For instance:

agent:
  securityContext:
    runAsUser: 20000
    runAsGroup: 20000

Note

Running the collector agent for log collection in non-root mode is not currently supported in CRI-O and OpenShift environments at this time. For more details, see the related GitHub feature request issue .

Configure custom TLS certificates πŸ”—

If your organization requires custom TLS certificates for secure communication with the Collector, follow these steps:

1. Create a Kubernetes secret containing the Root CA certificate, TLS certificate, and private key files πŸ”—

Store your custom CA certificate, key, and cert files in a Kubernetes secret in the same namespace as the your Splunk Helm chart.

For example, you can run this command:

kubectl create secret generic my-custom-tls --from-file=ca.crt=/path/to/custom_ca.crt --from-file=apiserver.key=/path/to/custom_key.key --from-file=apiserver.crt=/path/to/custom_cert.crt -n <namespace>

Note

You are responsible for externally managing this secret, which is not part of the Splunk Helm chart deployment.

2. Mount the secret in the Splunk Helm Chart πŸ”—

Apply this configuration to the agent, clusterReceiver, or gateway using the following Helm values:

  • agent.extraVolumes, agent.extraVolumeMounts

  • clusterReceiver.extraVolumes, clusterReceiver.extraVolumeMounts

  • gateway.extraVolumes, gateway.extraVolumeMounts

Learn more about Helm components at Helm chart architecture and components.

For example:

agent:
  extraVolumes:
    - name: custom-tls
      secret:
        secretName: my-custom-tls
  extraVolumeMounts:
    - name: custom-tls
      mountPath: /etc/ssl/certs/
      readOnly: true

clusterReceiver:
  extraVolumes:
    - name: custom-tls
      secret:
        secretName: my-custom-tls
  extraVolumeMounts:
    - name: custom-tls
      mountPath: /etc/ssl/certs/
      readOnly: true

gateway:
  extraVolumes:
    - name: custom-tls
      secret:
        secretName: my-custom-tls
  extraVolumeMounts:
    - name: custom-tls
      mountPath: /etc/ssl/certs/
      readOnly: true

3. Override your TLS configuration πŸ”—

Update the TLS configuration for specific Collector components, such as the agent’s kubeletstatsreceiver, to use the mounted certificate, key, and CA files.

For example:

agent:
  config:
    receivers:
      kubeletstats:
        auth_type: "tls"
        ca_file: "/etc/ssl/certs/custom_ca.crt"
        key_file: "/etc/ssl/certs/custom_key.key"
        cert_file: "/etc/ssl/certs/custom_cert.crt"
        insecure_skip_verify: true

Note

To skip certificate checks, you can disable secure TLS checks per component. This option is not recommended for production environments due to security standards.

Collect network telemetry using eBPF πŸ”—

You can collect network metrics and analyze them in Network Explorer using the OpenTelemetry eBPF Helm chart. See Introduction to Network Explorer for more information. To install and configure the eBPF Helm chart, see Install the eBPF Helm chart.

Note

Starting from version 0.88 of the Helm chart, the networkExplorer setting of the Splunk OpenTelemetry Collector Helm chart is deprecated. If you wish to continue using Network Explorer to see data in Splunk Observability Cloud, point the upstream eBPF Helm chart to the OpenTelemetry Collector running as a gateway as explained in Migrate from networkExplorer to eBPF Helm chart.

While Splunk Observability Cloud fully supports the Network Explorer navigator, the upstream OpenTelemetry eBPF Helm chart is not covered under official Splunk support. Any feature updates, security, or bug fixes to it are not bound by any SLAs.

Prerequisites πŸ”—

The OpenTelemetry eBPF Helm chart requires:

  • Kubernetes 1.24 or higher

  • Helm 3.9 or higher

Network metrics collection is only supported in the following Kubernetes-based environments on Linux hosts:

  • Red Hat Linux 7.6 or higher

  • Ubuntu 16.04 or higher

  • Debian Stretch or higher

  • Amazon Linux 2

  • Google COS

Modify the reducer footprint πŸ”—

The reducer is a single pod per Kubernetes cluster. If your cluster contains a large number of pods, nodes, and services, you can increase the resources allocated to it.

The reducer processes telemetry in multiple stages, with each stage partitioned into 1 or more shards, where each shard is a separate thread. Increasing the number of shards in each stage expands the capacity of the reducer. There are 3 stages: ingest, matching, and aggregation. You can set between 1 to 32 shards for each stage. There is one shard per reducer stage by default.

The following example sets the reducer to use 4 shards per stage:

reducer:
  ingestShards: 4
  matchingShards: 4
  aggregationShards: 4

Customize network telemetry generated by eBPF πŸ”—

You can deactivate metrics through the Helm chart configuration, either individually or by entire categories. See the values.yaml for a complete list of categories and metrics.

To deactivate an entire category, give the category name, followed by .all:

reducer:
  disableMetrics:
    - tcp.all

Deactivate individual metrics by their names:

reducer:
  disableMetrics:
    - tcp.bytes

You can mix categories and names. For example, to turn off all HTTP metrics and the udp.bytes metric, use:

reducer:
  disableMetrics:
    - http.all
    - udp.bytes

Reactivate metrics πŸ”—

To activate metrics you previously deactivated, use enableMetrics.

The disableMetrics flag is evaluated before enableMetrics, so you can deactivate an entire category, then reactivate individual metrics in that category that you are interested in.

For example, to deactivate all internal and http metrics but keep ebpf_net.collector_health, use:

reducer:
  disableMetrics:
    - http.all
    - ebpf_net.all
  enableMetrics:
    - ebpf_net.collector_health

Configure features using gates πŸ”—

Use the agent.featureGates, clusterReceiver.featureGates, and gateway.featureGates configs to activate or deactivate features of the otel-collector agent, clusterReceiver, and gateway, respectively. These configs are used to populate the otelcol binary startup argument -feature-gates.

For example, to activate feature1 in the agent, activate feature2 in the clusterReceiver, and deactivate feature2 in the gateway, run:

helm install {name} --set agent.featureGates=+feature1 --set clusterReceiver.featureGates=feature2 --set gateway.featureGates=-feature2 {other_flags}

Set the pod security policy manually πŸ”—

Support of Pod Security Policies (PSP) was removed in Kubernetes 1.25. If you still rely on PSPs in an older cluster, you can add PSP manually:

  1. Run the following command to install the PSP. Don’t forget to add the --namespace kubectl argument if needed:

cat <<EOF | kubectl apply -f -
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: splunk-otel-collector-psp
  labels:
    app: splunk-otel-collector-psp
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'runtime/default'
    apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
    seccomp.security.alpha.kubernetes.io/defaultProfileName:  'runtime/default'
    apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
spec:
  privileged: false
  allowPrivilegeEscalation: false
  hostNetwork: true
  hostIPC: false
  hostPID: false
  volumes:
  - 'configMap'
  - 'emptyDir'
  - 'hostPath'
  - 'secret'
  runAsUser:
    rule: 'RunAsAny'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
EOF
  1. Add the following custom ClusterRole rule in your values.yaml file along with all other required fields like clusterName, splunkObservability or splunkPlatform:

rbac:
  customRules:
    - apiGroups:     [extensions]
      resources:     [podsecuritypolicies]
      verbs:         [use]
      resourceNames: [splunk-otel-collector-psp]
  1. Install the Helm chart:

helm install my-splunk-otel-collector -f my_values.yaml splunk-otel-collector-chart/splunk-otel-collector

Configure data persistence queues πŸ”—

Without any configuration, data is queued in memory only. When data can’t be sent, it’s retried a few times for up to 5 minutes by default, and then dropped. If, for any reason, the Collector is restarted in this period, the queued data is discarded.

If you want the queue to be persisted on disk if the Collector restarts, set splunkPlatform.sendingQueue.persistentQueue.enabled=true to enable support for logs, metrics and traces.

By default, data is persisted in the /var/addon/splunk/exporter_queue directory. To override this path, use the splunkPlatform.sendingQueue.persistentQueue.storagePath option.

Check the Data Persistence in the OpenTelemetry Collector for a detailed explantion.

Note

Data can only be persisted for agent daemonsets.

Config examples πŸ”—

Use following in values.yaml to disable data persistense for logs, metrics, or traces:

Logs πŸ”—

agent:
  config:
    exporters:
        splunk_hec/platform_logs:
          sending_queue:
            storage: null

Metrics πŸ”—

agent:
  config:
    exporters:
      splunk_hec/platform_metrics:
        sending_queue:
          storage: null

Traces πŸ”—

agent:
  config:
    exporters:
      splunk_hec/platform_traces:
        sending_queue:
          storage: null

Support for persistent queue πŸ”—

The following support is offered:

Support for GKE/Autopilot and EKS/Fargate πŸ”—

Persistent buffering is not supported for GKE/Autopilot and EKS/Fargate, since the directory needs to be mounted via hostPath.

Also, GKE/Autopilot and EKS/Fargate don’t allow volume mounts, as Splunk Observability Cloud doesn’t manage the underlying infrastructure.

Refer to aws/fargate and gke/autopilot for more information.

Gateway support πŸ”—

The filestorage extention acquires an exclusive lock for the queue directory.

It’s not possible to run persistent buffering if there are multiple replicas of a pod. Even if support could be provided, only one of the pods will be able to acquire the lock and run, while the others will be blocked and unable to operate.

Cluster Receiver support πŸ”—

The Cluster receiver is a 1-replica deployment of the OpenTelemetry Collector. Because the Kubernetes control plane can select any available node to run the cluster receiver pod (unless clusterReceiver.nodeSelector is explicitly set to pin the pod to a specific node), hostPath or local volume mounts don’t work for such environments.

Data persistence is currently not applicable to the Kubernetes cluster metrics and Kubernetes events.

Monitor OpenShift infrastructure nodes πŸ”—

By default, the Splunk Distribution of OpenTelemetry Collector for Kubernetes doesn’t collect data from OpenShift infrastructure nodes.

You can customize the Collector Helm Chart file to activate data collection from OpenShift infrastructure nodes. To do so, complete the following steps:

  1. Open your values.yaml file for the Helm Chart.

  2. Copy and paste the following YAML snippet into the values.yaml file:

    tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        effect: NoSchedule
        operator: Exists
    
  3. Install the Collector using the Helm Chart:

    helm install my-splunk-otel-collector --values values.yaml splunk-otel-collector-chart/splunk-otel-collector
    

Note

Monitoring OpenShift infrastructure nodes might pose a security risk depending on which method you used to create the Kubernetes environment.

This page was last updated on Nov 06, 2024.