Apache Spark 🔗

Note

If you’re using the Splunk Distribution of the OpenTelemetry Collector and want to collect Apache Spark cluster metrics, use the native OTel component Apache Spark receiver.

The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the Apache Spark monitor type to monitor Apache Spark clusters. It does not support fetching metrics from Spark Structured Streaming.

For the following cluster modes, the integration only supports HTTP endpoints:

Standalone
Mesos
Hadoop YARN

This collectd plugin is not compatible with Kubernetes cluster mode. You need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, set isMaster to true. When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the primary node.

This integration is only available on Linux.

Benefits 🔗

After you configure the integration, you can access these features:

View metrics. You can create your own custom dashboards, and most monitors provide built-in dashboards as well. For information about dashboards, see View dashboards in Splunk Observability Cloud.
View a data-driven visualization of the physical servers, virtual machines, AWS instances, and other resources in your environment that are visible to Infrastructure Monitoring. For information about navigators, see Use navigators in Splunk Infrastructure Monitoring.
Access the Metric Finder and search for metrics sent by the monitor. For information, see Search the Metric Finder and Metadata Catalog.

Installation 🔗

Follow these steps to deploy this integration:

Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:
- Install on Kubernetes
- Install on Linux
Configure the monitor, as described in the Configuration section.
Restart the Splunk Distribution of OpenTelemetry Collector.

Configuration 🔗

To use this integration of a Smart Agent monitor with the Collector:

Include the Smart Agent receiver in your configuration file.
Add the monitor type to the Collector configuration, both in the receiver and pipelines sections.

See how to Use Smart Agent monitors with the Collector.
See how to set up the Smart Agent receiver.
For a list of common configuration options, refer to Common configuration settings for monitors.
Learn more about the Collector at Get started: Understand and use the Collector.

Example 🔗

To activate this integration, add one of the following to your Collector configuration:

receivers:
  smartagent/collectd_spark_master:
    type: collectd/spark
    ...  # Additional config

receivers:
  smartagent/collectd_spark_worker:
    type: collectd/spark
    ...  # Additional config

Next, add the monitor to the service.pipelines.metrics.receivers section of your configuration file:

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_master]

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_worker]

Note: The names collectd_spark_master and collectd_spark_worker are for identification purposes only and don’t affect functionality. You can use either name in your configuration, but you need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, see the isMaster field in the configuration settings section.

Configuration settings 🔗

The following table shows the configuration options for this integration:

Option	Required	Type	Description
`pythonBinary`	no	`string`	This option specifies the path to a Python binary that executes the Python code. If you don’t set this option, the system uses a built-in runtime. You can also include arguments to the binary.
`host`	yes	`string`
`port`	yes	`integer`
`isMaster`	no	`bool`	Set this option to `true` when you want to monitor a primary Spark node. The default is `false`.
`clusterType`	yes	`string`	Set this option to the type of cluster you’re monitoring. The allowed values are `Standalone`, `Mesos` or `Yarn`. The system doesn’t collect cluster metrics for Yarn. Use the collectd/hadoop monitor to gain insights to your cluster’s health.
`collectApplicationMetrics`	no	`bool`	The default is `false`.
`enhancedMetrics`	no	`bool`	The default is `false`.

Metrics 🔗

These are the metrics available for this integration:

Notes 🔗

To learn more about the available in Splunk Observability Cloud see Metric types
In host-based subscription plans, default metrics are those metrics included in host-based subscriptions in Splunk Observability Cloud, such as host, container, or bundled metrics. Custom metrics are not provided by default and might be subject to charges. See Metric categories for more information.
In MTS-based subscription plans, all metrics are custom.
To add additional metrics, see how to configure extraMetrics in Add additional metrics

Troubleshooting 🔗

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.

Available to Splunk Observability Cloud customers

Submit a case in the Splunk Support Portal .
Contact Splunk Support .

Available to prospective customers and free trial users

Ask a question and get answers through community support at Splunk Answers .
Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups in the Get Started with Splunk Community manual.

This page was last updated on Feb 11, 2025.

Related Topics