Docs » Supported integrations in Splunk Observability Cloud » Configure application receivers for databases » Apache Spark

Apache Spark ๐Ÿ”—

Note

If youโ€™re using the Splunk Distribution of the OpenTelemetry Collector and want to collect Apache Spark cluster metrics, use the native OTel component Apache Spark receiver.

The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the Apache Spark monitor type to monitor Apache Spark clusters. It does not support fetching metrics from Spark Structured Streaming.

For the following cluster modes, the integration only supports HTTP endpoints:

  • Standalone

  • Mesos

  • Hadoop YARN

This collectd plugin is not compatible with Kubernetes cluster mode. You need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, set isMaster to true. When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the primary node.

This integration is only available on Linux.

Benefits ๐Ÿ”—

After you configure the integration, you can access these features:

Installation ๐Ÿ”—

Follow these steps to deploy this integration:

  1. Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:

  2. Configure the monitor, as described in the Configuration section.

  3. Restart the Splunk Distribution of OpenTelemetry Collector.

Configuration ๐Ÿ”—

To use this integration of a Smart Agent monitor with the Collector:

  1. Include the Smart Agent receiver in your configuration file.

  2. Add the monitor type to the Collector configuration, both in the receiver and pipelines sections.

Example ๐Ÿ”—

To activate this integration, add one of the following to your Collector configuration:

receivers:
  smartagent/collectd_spark_master:
    type: collectd/spark
    ...  # Additional config
receivers:
  smartagent/collectd_spark_worker:
    type: collectd/spark
    ...  # Additional config

Next, add the monitor to the service.pipelines.metrics.receivers section of your configuration file:

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_master]
service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_worker]

Note: The names collectd_spark_master and collectd_spark_worker are for identification purposes only and donโ€™t affect functionality. You can use either name in your configuration, but you need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, see the isMaster field in the configuration settings section.

Configuration settings ๐Ÿ”—

The following table shows the configuration options for this integration:

Option

Required

Type

Description

pythonBinary

no

string

This option specifies the path to a Python binary that executes

the Python code. If you donโ€™t set this option, the system uses a built-in runtime. You can also include arguments to the binary.

host

yes

string

port

yes

integer

isMaster

no

bool

Set this option to true when you want to monitor a primary

Spark node. The default is false.

clusterType

yes

string

Set this option to the type of cluster youโ€™re monitoring. The

allowed values are Standalone, Mesos or Yarn. The system doesnโ€™t collect cluster metrics for Yarn. Use the collectd/hadoop monitor to gain insights to your clusterโ€™s health.

collectApplicationMetrics

no

bool

The default is false.

enhancedMetrics

no

bool

The default is false.

Metrics ๐Ÿ”—

These are the metrics available for this integration:

Notes ๐Ÿ”—

  • To learn more about the available in Splunk Observability Cloud see Metric types

  • In host-based subscription plans, default metrics are those metrics included in host-based subscriptions in Splunk Observability Cloud, such as host, container, or bundled metrics. Custom metrics are not provided by default and might be subject to charges. See Metric categories for more information.

  • In MTS-based subscription plans, all metrics are custom.

  • To add additional metrics, see how to configure extraMetrics in Add additional metrics

Troubleshooting ๐Ÿ”—

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.

Available to Splunk Observability Cloud customers

Available to prospective customers and free trial users

  • Ask a question and get answers through community support at Splunk Answers .

  • Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups in the Get Started with Splunk Community manual.

This page was last updated on Oct 04, 2024.