Docs » Configure application receivers » Configure application receivers for databases » Apache Spark

Apache Spark 🔗

Description 🔗

The Splunk Distribution of OpenTelemetry Collector provides this integration as the Apache Spark monitor via the Smart Agent receiver.

The integration monitors Apache Spark clusters.

The following cluster modes are supported only through HTTP endpoints:

  • Standalone

  • Mesos

  • Hadoop YARN

You need to specify distinct monitor configurations and discovery rules for master and worker processes. For the master configuration, set isMaster to true.

When running Spark on Hadoop YARN, the integration is capable of reporting only application metrics from the master node.

Installation 🔗

This monitor is available in the SignalFx Smart Agent Receiver, which is part of the Splunk Distribution of OpenTelemetry Collector.

To install this integration:

  1. Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform.

  2. Configure the monitor, as described in the next section.

Configuration 🔗

The Splunk Distribution of OpenTelemetry Collector allows embedding a Smart Agent monitor configuration in an associated Smart Agent Receiver instance.

Note: Providing an Apache Spark monitor entry in your Smart Agent or Collector configuration is required for its use. Use the appropriate form for your agent type.

Smart Agent 🔗

To activate this monitor in the Smart Agent, add the following to your agent configuration:

monitors:  # All monitor config goes under this key
  - type: collectd/spark
    ...  # Additional config

See Smart Agent example configuration for an autogenerated example of a YAML configuration file, with default values where applicable.

Splunk Distribution of OpenTelemetry Collector 🔗

To activate this monitor in the Splunk Distribution of OpenTelemetry Collector, add the following to your agent configuration:

    type: collectd/spark
    ...  # Additional config

To complete the monitor activation, you must also include the smartagent/spark receiver item in a metrics pipeline. To do this, add the receiver item to the service > pipelines > metrics > receivers section of your configuration file.

See configuration examples for specific use cases that show how the Splunk OpenTelemetry Collector can integrate and complement existing environments.

The following configuration options are available for this monitor:

Option Required Type Description
pythonBinary no string Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well.
host yes string
port yes integer
isMaster no bool Set to true when monitoring a master Spark node (default: false)
clusterType yes string Should be one of Standalone or Mesos or Yarn. Cluster metrics will not be collected on Yarn. Please use the collectd/hadoop monitor to gain insights to your cluster's health.
collectApplicationMetrics no bool (default: false)
enhancedMetrics no bool (default: false)

Metrics 🔗

The following metrics are available for this integration:

Non-default metrics (version 4.7.0+) 🔗

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Troubleshooting 🔗

If you are not able to see your data in Splunk Observability Cloud: