Docs » Available host and application monitors » Configure application receivers for databases » Apache Spark

Apache Spark 🔗

Description 🔗

The Splunk Distribution of OpenTelemetry Collector provides this integration as the Apache Spark monitor by using the SignalFx Smart Agent receiver.

The integration monitors Apache Spark clusters.

Note: This integration does not support fetching metrics from Spark Structured Streaming.

For the following cluster modes, the integration only supports HTTP endpoints:

  • Standalone

  • Mesos

  • Hadoop YARN

You need to select distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, set isMaster to true.

When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the master node.

Benefits 🔗

After you configure the integration, you can access these features:

Installation 🔗

Follow these steps to deploy this integration:

  1. Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:

  2. Configure the monitor, as described in the Configuration section.

  3. Restart the Splunk Distribution of OpenTelemetry Collector.

Configuration 🔗

This monitor type is available in the Smart Agent Receiver, which is part of the Splunk Distribution of OpenTelemetry Collector. You can use existing Smart Agent monitors as OpenTelemetry Collector metric receivers with the Smart Agent Receiver.

This monitor type requires a properly configured environment on your system in which you’ve installed a functional Smart Agent release bundle. The Collector provides this bundle in the installation paths for x86_64/amd64.

To activate this monitor type in the Collector, add the following lines to your configuration (YAML) file:

To activate this monitor in the Splunk Distribution of OpenTelemetry Collector, add one of the following to your agent configuration:

receivers:
  smartagent/collectd_spark_master:
    type: collectd/spark
    ...  # Additional config
receivers:
  smartagent/collectd_spark_worker:
    type: collectd/spark
    ...  # Additional config

To complete the integration, include the monitor in a metrics pipeline. Add the monitor item to the service/pipelines/metrics/receivers section of your configuration file. For example:

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_master]
service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_worker]

Note: The names of the monitor, collectd_spark_master and collectd_spark_worker, are for identification purposes and don’t affect functionality. You can use either name in your configuration, but you need to select distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, see the isMaster field in the Configuration settings section.

Configuration settings 🔗

The following table shows the configuration options for this monitor:

Option

Required

Type

Description

pythonBinary

no

string

This option specifies the path to a Python binary that executes the Python code. If you don’t set this option, the system uses a built-in runtime. You can also include arguments to the binary.

host

yes

string

port

yes

integer

isMaster

no

bool

Set this option to true when you want to monitor a master Spark node. The default is false.

clusterType

yes

string

Set this option to the type of cluster you’re monitoring. The allowed values are Standalone, Mesos or Yarn. The system doesn’t collect cluster metrics for Yarn. Use the collectd/hadoop monitor to gain insights to your cluster’s health.

collectApplicationMetrics

no

bool

The default is false.

enhancedMetrics

no

bool

The default is false.

Metrics 🔗

These are the metrics available for this integration:

Get help 🔗

If you are not able to see your data in Splunk Observability Cloud, try these tips:

To learn about even more support options, see Splunk Customer Success.