Docs » Supported integrations in Splunk Observability Cloud » Collector components: Receivers » Apache Spark receiver

Apache Spark receiver ๐Ÿ”—

The Apache Spark receiver monitors Apache Spark clusters and the applications running on them through the collection of performance metrics like memory utilization, CPU utilization, shuffle operations, and more. The supported pipeline type is metrics. See Process your data with pipelines for more information.

Note

Out-of-the-box dashboards and navigators arenโ€™t supported for the Apache Spark receiver yet, but are planned for a future release.

The receiver retrieves metrics through the Apache Spark REST API using the following endpoints: /metrics/json, /api/v1/applications/[app-id]/stages, /api/v1/applications/[app-id]/executors, and /api/v1/applications/[app-id]/jobs endpoints.

Prerequisites ๐Ÿ”—

This receiver supports Apache Spark versions 3.3.2 or higher.

Get started ๐Ÿ”—

Follow these steps to configure and activate the component:

  1. Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:

  2. Configure the receiver as described in the next section.

  3. Restart the Collector.

Sample configuration ๐Ÿ”—

To activate the Apache Spark receiver, add apachespark to the receivers section of your configuration file:

receivers:
  apachespark:
    collection_interval: 60s
    endpoint: http://localhost:4040
    application_names:
    - PythonStatusAPIDemo
    - PythonLR

To complete the configuration, include the receiver in the metrics pipeline of the service section of your configuration file:

service:
  pipelines:
    metrics:
      receivers: [apachespark]

Configuration options ๐Ÿ”—

The following settings are optional:

  • collection_interval. 60s by default. Sets the interval this receiver collects metrics on.

    • This value must be a string readable by Golangโ€™s time.ParseDuration. Learn more at Goโ€™s official documentation ParseDuration function .

    • Valid time units are ns, us (or ยตs), ms, s, m, h.

    • initial_delay. 1s by default. Determines how long this receiver waits before collecting metrics for the first time.

  • endpoint. http://localhost:4040 by default. Apache Spark endpoint to connect to in the form of [http][://]{host}[:{port}].

  • application_names. An array of Spark application names for which metrics are collected from. If no application names are specified, metrics are collected for all Spark applications running on the cluster at the specified endpoint.

Settings ๐Ÿ”—

The full list of settings exposed for this receiver are documented in the Apache Spark receiver config repo in GitHub.

Metrics ๐Ÿ”—

The following metrics, resource attributes, and attributes are available.

Note

The SignalFx exporter excludes some available metrics by default. Learn more about default metric filters in List of metrics excluded by default.

Activate or deactivate specific metrics ๐Ÿ”—

You can activate or deactivate specific metrics by setting the enabled field in the metrics section for each metric. For example:

receivers:
  samplereceiver:
    metrics:
      metric-one:
        enabled: true
      metric-two:
        enabled: false

The following is an example of host metrics receiver configuration with activated metrics:

receivers:
  hostmetrics:
    scrapers:
      process:
        metrics:
          process.cpu.utilization:
            enabled: true

Note

Deactivated metrics arenโ€™t sent to Splunk Observability Cloud.

Billing ๐Ÿ”—

  • If youโ€™re in a MTS-based subscription, all metrics count towards metrics usage.

  • If youโ€™re in a host-based plan, metrics listed as active (Active: Yes) on this document are considered default and are included free of charge.

Learn more at Infrastructure Monitoring subscription usage (Host and metric plans).

Troubleshooting ๐Ÿ”—

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.

Available to Splunk Observability Cloud customers

Available to prospective customers and free trial users

  • Ask a question and get answers through community support at Splunk Answers .

  • Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups in the Get Started with Splunk Community manual.

This page was last updated on Nov 13, 2024.