Docs » Get started with the Splunk Distribution of the OpenTelemetry Collector » Migrate from SignalFx Smart Agent to the Splunk Distribution of OpenTelemetry Collector » Migration process from the Smart Agent to the Splunk Distribution of the OpenTelemetry Collector

Migration process from the Smart Agent to the Splunk Distribution of the OpenTelemetry Collector πŸ”—

Note

Using this content assumes that you’re running the SignalFx SmartAgent in the Kubernetes, Linux, or Windows environments and want to migrate to the Splunk Distribution of OpenTelemetry Collector to collect telemetry data. Note that you cannot use both agents simultaneously on the same host. See more details about this in the section Conflicting semantics in Mapping service and migration impact report.

Do the following steps to migrate from the Smart Agent to the Collector:

  1. Deploy the Collector in a non-production environment

  2. Validate the deployment of the Collector

  3. Locate your existing Smart Agent configuration file

  4. Estimate resource utilization (sizing) for the production environment

  5. Deploy the Collector to the non-production environment using the updated configuration file

  6. Deploy the Collector to a production host using the updated configuration file

1. Deploy the Collector in a non-production environment πŸ”—

Deploy the Collector in a non-production environment, for example, a development host or VM or a Kubernetes cluster in staging. The environment needs to be a copy or identical to your production environment.

Navigate to your instance of Splunk Observability Cloud and select Data Management > Available integrations in the navigation bar. Choose the platform you want to deploy the Collector to.

Select Data Management in the navigation bar.

Follow the guided setup for your platform to deploy the Collector.

Note

See the tooltips within the guided setup for guidance about the initial configuration.

2. Validate the deployment of the Collector πŸ”—

Validate the deployment of the Collector using the following approaches in the described order.

Validate using dashboards πŸ”—

Start with looking at the built-in dashboard for the Collector where you can visualize:

  • Process metrics such as memory and CPU usage

  • Dropped, failure and success metrics for telemetry processing (metrics, spans, logs)

Select Dashboards in the navigation bar.

Select Dashboards in the navigation bar.

Search for OpenTelemetry Collector to access the built-in dashboard group.

Search for the OpenTelemetry Collector.

Navigate to the Critical Monitoring section and review whether there is any data being dropped to ensure that there is no data loss and that telemetry data is not being dropped. You should see a chart for metrics, spans, and logs.

Navigate to the Critical Monitoring section and review whether there is any data being dropped.

If any of the charts indicate a value above zero, then data is being dropped and you need to investigate why. To diagnose further, see Validate using logs.

Validate using zPages πŸ”—

To ensure that the Collector is configured correctly, activate the zPages extension.

This is exposed locally on port 55679 by default and can be used to give an overview of the following:

  • Services and build, runtime information (http://localhost:<port>/debug/servicez)

  • Running pipelines (http://localhost:<port>/debug/pipelinez)

  • Extensions (http://localhost:<port>/debug/extensionz)

  • Feature gates (http://localhost:<port>/debug/featurez)``

  • Spans and error samples (http://localhost:<port>/debug/tracez)

  • RPC statistics (http://localhost:<port>/debug/servicez/rpcz)

For containerized environments, you can expose this port on a public interface instead of just locally. This can be configured by adding the following lines to the configuration:

extensions:
   zpages:
      endpoint: 0.0.0.0:55679

Validate using the Metric Finder πŸ”—

Use the Metric Finder to ensure that metrics are coming in from a specific integration. Select Metric Finder in the navigation bar.

Select Metric Finder in the navigation bar.

Find the integration as part of the list present. For example, if you deployed the Collector on the Kubernetes platform, scroll to the Containers category and select Kubernetes. Search results from all metrics being pulled in by default from the Kubernetes integration and the associated metadata that can be filtered or excluded are shown.

Find your integration.

Select a specific metric, for example, container_cpu_utilization.

Select a specific metric.

You can now view the metric as a chart that displays the time series data across the duration you select.

View the metric represented as a chart.

If you are unable to find any metrics (in the search results or there are no data points recently in the chart) from an integration configured to be monitored, go to the section on validating using logs.

Note

If metrics are found in the Metric Finder, but not seen in the chart for the time duration specified, they were reported at some point. Change the time duration specified to help look at logs at a specific timestamp.

Validate using logs πŸ”—

You can use logs to validate the Collector deployment. Use the following commands based on your environment:

For Docker:

docker logs my-container >my-container.log

For Journald:

journalctl -u my-service >my-service.log

For Kubernetes:

kubectl describe pod my-pod kubectl logs my-pod otel-collector >my-pod-otel.log kubectl logs my-pod fluentd >my-pod-fluentd.log

Check for the following errors:

  • Port conflicts: You might see a β€œbind:address already in use” error message. If you see this message, modify the configuration to use another port.

  • HTTP error codes indicating specific use cases:

    • 401 (UNAUTHORIZED): Configured access token or realm is incorrect

    • 404 (NOT FOUND): Likely configuration parameter is wrong like endpoint or path (for example, /v1/log); possible network/firewall/port issue

    • 429 (TOO MANY REQUESTS): Org is not provisioned for the amount of traffic being sent; reduce traffic or request increase in capacity

    • 503 (SERVICE UNAVAILABLE): If using the Log Observer, this is the same as 429 (because that is how HECv1 responds). Otherwise, check the status page.

To confirm that a specific receiver is fetching metrics exposed by an application, update the configuration file, as shown in the following example.

Set the logging level to debug:

service:
   telemetry:
      logs:
         level: debug

Set log_data_points to true using the SignalFx exporter:

exporters:
   signalfx:
      ...
      log_data_points: true
      ...

After updating the configuration, restart the Collector. Check the logs for your environment to validate the deployment.

If you are unable to determine the issue from logs, see Splunk Observability Cloud support. Gather as much information as possible related to the environment, platforms, configuration, and logs.

3. Locate your existing Smart Agent configuration file πŸ”—

The Smart Agent can be configured by editing the agent.yaml file. By default, the configuration is installed at /etc/signalfx/agent.yaml on Linux and \ProgramData\SignalFxAgent\agent.yaml on Windows. If you override the location while installing the Smart Agent using the -config command line flag, the configuration file is stored at the location that you specify.

The following is an example YAML configuration file with default values where applicable:

signalFxAccessToken: {"#from": "env:SIGNALFX_ACCESS_TOKEN"}
ingestUrl: https://ingest.us1.signalfx.com
apiUrl: https://api.us1.signalfx.com

bundleDir: /opt/my-smart-agent-bundle

procPath: /my_custom_proc
etcPath: /my_custom_etc
varPath: /my_custom_var
runPath: /my_custom_run
sysPath: /my_custom_sys

observers:
   - type: k8s-api

collectd:
   readThreads: 10
   writeQueueLimitHigh: 1000000
   writeQueueLimitLow: 600000
configDir: "/tmp/signalfx-agent/collectd"

monitors:
   - type: collectd/activemq
      discoveryRule: container_image =~ "activemq" && private_port == 1099
      extraDimensions:
         my_dimension: my_dimension_value
   - type: collectd/apache
      discoveryRule: container_image =~ "apache" && private_port == 80
   - type: postgresql
      discoveryRule: container_image =~ "postgresql" && private_port == 7199
      extraDimensions:
         my_other_dimension: my_other_dimension_value
   - type: processlist

4. Estimate resource utilization (sizing) for the production environment πŸ”—

The sizing of the Collector and the corresponding VM or host it is to be deployed to should be based on the telemetry being collected. The Collector requires 1 CPU core per:

  • 15,000 spans per second

  • 20,000 data points per second

  • 10,000 log records per second

The Smart Agent has an internal metrics monitor that emits metrics about the internal state of the agent. This is useful for debugging performance issues with the Collector and to ensure that the Collector isn’t overloaded. Add the following to your Smart Agent configuration file:

monitors:
   - type: internal-metrics

Note that this addition to your Smart Agent configuration file is only necessary to verify the data being sent through the Smart Agent. The Smart Agent configuration file is deleted when you deploy the Collector to a production host using the updated configuration file.

After the configuration file is updated, restart the Smart Agent.

You can then use the sfxagent.datapoints_sent and sfxagent.trace_spans_sent metrics to estimate the number of data points and spans being sent to Splunk Observability Cloud respectively. You can plot them on a dashboard and filter based on dimensions to ascertain the total per cluster or host.

Note

The sizing recommendation for logs also accounts for td-agent (Fluentd) that can be activated with the Collector.

If a Collector handles both trace and metric data, then both must be accounted for when sizing. For example, 7.5K spans per second plus 10K data points per second would require 1 CPU core.

Use a ratio of 1 CPU to 2 GB of memory. By default, the Collector is configured to use 512 MB of memory.

Configure the memory_limiter processor on every Collector instance, as shown in the following examples:

processors:
   memory_limiter:
      check_interval:
      limit_mib:
      spike_limit_mib:

Note

Define the memory_limiter processor as the first processor in the pipeline, immediately after the receivers.

5. Deploy the Collector to the non-production environment using the updated configuration file πŸ”—

Complete the necessary updates and translation of the configuration file, and restart the Collector on the non-production environment using the updated file.

Restart the Collector πŸ”—

On Linux:

sudo systemctl restart splunk-otel-collector

On Windows:

Stop-Service splunk-otel-collector
Start-Service splunk-otel-collector

On Kubernetes:

helm upgrade my-splunk-otel-collector --values my_values.yaml splunk-otel-collector-chart/splunk-otel-collector

After the Collector is restarted successfully, validate the deployment to make sure data is being collected and that there are no errors with the updated configuration file.

6. Deploy the Collector to a production host using the updated configuration file πŸ”—

After successfully deploying the Collector to a non-production environment and verifying that data is getting into Splunk Observability Cloud as expected, as a first step, stop and uninstall the Smart Agent from a single production host or VM to begin the migration. Follow the commands below for each respective environment:

On Linux:

For Debian-based distributions, including Ubuntu, run the following command:

sudo dpkg --remove signalfx-agent

For Red Hat, CentOS, and other RPM-based installs, run the following command:

sudo rpm -e signalfx-agent

On Windows (installer):

Uninstall the Smart Agent from Programs and Features in the Control Panel.

On Windows (ZIP file):

Run the following PowerShell commands to stop and uninstall the signalfx-agent service:

SignalFxAgent\bin\signalfx-agent.exe -service "stop"
SignalFxAgent\bin\signalfx-agent.exe -service "uninstall"

After uninstalling the Smart Agent, deploy the Collector to a production host using the updated configuration file and then validate the deployment of the Collector.

After verifying with one host, deploy the Collector with the same configuration to the rest of the hosts.

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.

Available to Splunk Observability Cloud customers

Available to prospective customers and free trial users

  • Ask a question and get answers through community support at Splunk Answers .

  • Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups in the Get Started with Splunk Community manual.

This page was last updated on May 29, 2024.