Troubleshoot Java instrumentation for Splunk Observability Cloud ๐
When you instrument a Java application using the Splunk Distribution of OpenTelemetry Java and you donโt see your data in Observability Cloud, follow these troubleshooting steps.
Steps for troubleshooting Java OpenTelemetry issues ๐
The following steps can help you troubleshoot Java agent issues:
Activate debug logging ๐
Debug logging is a special execution mode that outputs more information about the Java agent of the Splunk Distribution of OpenTelemetry Java. This can help you troubleshoot Java instrumentation issues.
To activate debug logging for the Java agent, select one of the following options:
Pass the following argument when running your application:
-Dotel.javaagent.debug=true
.Set the
OTEL_JAVAAGENT_DEBUG
environment variable totrue
before running your application.
When you run the agent with debug logging activated, debug information is sent to the console as stderr
. Debug log entries look like the following example:
...
[opentelemetry.auto.trace 2021-10-10 10:57:05:814 +0200] [main] DEBUG io.opencensus.tags.Tags - <Could not load lite implementation for TagsComponent, now using default implementation for TagsComponent.3>
[opentelemetry.auto.trace 2021-10-10 10:57:05:722 +0200] [main] DEBUG io.grpc.netty.shaded.io.netty.util.internal.PlatformDependent0 - direct buffer constructor: unavailable
...
While not all debug entries are relevant to the issue affecting your Java instrumentation, the root cause is likely to appear in your debug log.
Note
Activate debug logging only when needed. Debug mode requires more resources.
Check the status of the runtime ๐
Run the jps -lvm
command to verify that the Java runtime has started. The output is a list of all the Java Virtual Machines (JVM) currently running. Make sure the JVM you instrumented appears among them.
In the following example, the first entry shows a JVM running the agent with -javaagent
:
37602 target/spring-petclinic-2.4.5.jar -javaagent:./splunk-otel-javaagent.jar -Dotel.resource.attributes=service.name=pet-store-demo,deployment.environment=prod,service.version=1.2.0 -Dotel,javaagent.debug=true
38262 jdk.jcmd/sun.tools.jps.Jps -lvm -Dapplication.home=/usr/lib/jvm/java-16-openjdk-amd64 -Xms8m -Djdk.module.main=jdk.jcmd
If the instrumented JVM doesnโt appear in the list, check the JVM or application logs to find the cause of the problem. Also check that the additional startup parameters are correctly passed to the runtime. See Instrument a Java application for Splunk Observability Cloud to learn more about startup parameters.
Library instrumentation issues ๐
If you find an issue with a specific instrumentation of a library, or suspect there might be an issue affecting that instrumentation, deactivating it can help you troubleshoot the Java agent.
To deactivate a specific library instrumentation, add the following argument:
-Dotel.instrumentation.<name>.enabled=false
Replace <name>
with the corresponding instrumentation from the OpenTelemetry Java instrumentation on GitHub at https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#suppressing-specific-auto-instrumentation.
Class instrumentation issues ๐
You can prevent specific classes from being instrumented. Excluded classes donโt send spans, which is useful for muting specific classes or packages.
To deactivate instrumentation for a class, set the otel.javaagent.exclude-classes
system property or the OTEL_JAVAAGENT_EXCLUDE_CLASSES
environment variable to the name of the class or classes.
You can enter multiple classes. For example, my.package.MyClass,my.package2.*
.
Caution
Deactivating instrumentation for specific classes can have unintended side effects. Use this feature with caution.
Trace exporter issues ๐
By default, the Splunk Distribution of OpenTelemetry Java uses the OTLP exporter. Any issue affecting the export of traces produces an error in the debug logs.
OTLP canโt export spans ๐
The following error in the logs means that the agent canโt send trace data to the OpenTelemetry Collector:
[BatchSpanProcessor_WorkerThread-1] ERROR io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE: io exception
To troubleshoot the lack of connectivity between the OTLP exporter and the OTel Collector, try the following steps:
Make sure that
otel.exporter.otlp.endpoint
points to the correct OpenTelemetry Collector instance host.Check that your OTel Collector instance is configured and running. See Troubleshoot the Splunk OpenTelemetry Collector.
Check that the OTLP gRPC receiver is activated in the OTel Collector and plugged into the traces pipeline.
Check that the OTel Collector points to the following address:
http://<host>:4317
. Verify that your URL is correct.
Channel pipeline error ๐
If youโre seeing the following error in your logs, it might mean that the Java agent is trying to send trace data to the Splunk ingest API endpoint, which is not yet supported by OTLP:
[grpc-default-executor-1] ERROR io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE: io exception
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
To solve this issue, use the Jaeger exporter instead. See Exporters configuration.
Jaeger canโt export spans ๐
The following warnings in your logs mean that the Java agent canโt send trace data to the OTel Collector, the Smart Agent (now deprecated), or Splunk Cloud Platform using the Jaeger exporter:
[BatchSpanProcessor_WorkerThread-1] WARN io.opentelemetry.exporter.jaeger.thrift.JaegerThriftSpanExporter - Failed to export spans
io.jaegertracing.internal.exceptions.SenderException: Could not send 8 spans
at io.jaegertracing.thrift.internal.senders.HttpSender.send(HttpSender.java:69)
...
Caused by: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:9080
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:265)
...
Caused by: java.net.ConnectException: Connection refused (Connection refused)
...
To troubleshoot the lack of connectivity between Jaeger and Splunk Observability Cloud, try the following steps:
Make sure that
otel.exporter.jaeger.endpoint
points to an OpenTelemetry Collector or Smart Agent instance, or to the Splunk Ingest URL. See Send data measurements in the Splunk Developer documentation.Check that the OpenTelemetry Collector or Smart Agent instance is configured and running.
Check that the Jaeger Thrift HTTP receiver is activated and plugged into the traces pipeline. See Exposed ports and endpoints.
Check that the endpoint is correct. The OpenTelemetry Collector or Smart Agent use different ports and paths by default. For the Jaeger receiver, the OTel Collector uses
http://<host>:14268/api/traces
, while the Smart Agent useshttp://<host>:9080/v1/trace
.
401 error when sending spans ๐
If you send traces directly to Observability Cloud and receive a 401 error code, the authentication token specified in SPLUNK_ACCESS_TOKEN
is invalid. The following are possible reasons:
The value is null.
The value is not a well-formed token.
The token is not an access token that has
authScope
set to ingest.
Make sure that youโre using a valid Splunk access token when sending data directly to your Splunk platform instance. See Retrieve and manage user API access tokens using Splunk Observability Cloud.
Metrics exporter issues ๐
If you see the following warning in your logs, it means that the Java agent canโt send metrics to your OTel Collector, Smart Agent (now deprecated), or to the Splunk platform endpoints:
[signalfx-metrics-publisher] WARN com.splunk.javaagent.shaded.io.micrometer.signalfx.SignalFxMeterRegistry - failed to send metrics: Unable to send data points
To troubleshoot connectivity issues affecting application metrics, try the following steps:
Make sure that
splunk.metrics.endpoint
points to the correct host.Check that the OpenTelemetry Collector or Smart Agent instance is configured and running.
Check that the OpenTelemetry Collector or Smart Agent are using the correct ports for the SignalFx receiver. The Collector uses
http://<host>:9943
, and the Smart Agent useshttp://<host>:9080/v2/datapoint
.Make sure that youโre using a valid Splunk access token when sending data directly to your Splunk platform instance. See Retrieve and manage user API access tokens using Splunk Observability Cloud.
Note
Metric collection for Java using OpenTelemetry instrumentation is still experimental.
Troubleshoot AlwaysOn Profiling for Java ๐
Follow these steps to troubleshoot issues with AlwaysOn Profiling:
Check that AlwaysOn Profiling is activated ๐
The Java agent logs the string JFR profiler is active
at startup using an INFO
message. To check whether AlwaysOn Profiling is activated, search your logs for strings similar to the following:
[otel.javaagent 2021-09-28 18:17:04:246 +0000] [main] INFO com.splunk.opentelemetry.profiler.JfrActivator - JFR profiler is active.
If the string does not appear, make sure that youโve activated the profiler by setting the splunk.profiler.enabled
system property or the SPLUNK_PROFILER_ENABLED
environment variable. See Java settings for AlwaysOn Profiling.
Check the AlwaysOn Profiling configuration ๐
If AlwaysOn Profiling is not working as intended, check the configuration settings. The Java agent logs AlwaysOn Profiling settings using INFO messages at startup. Search for the string com.splunk.opentelemetry.profiler.ConfigurationLogger
to see entries like the following:
[otel.javaagent 2021-09-28 18:17:04:237 +0000] [main] INFO <snip> - -----------------------
[otel.javaagent 2021-09-28 18:17:04:237 +0000] [main] INFO <snip> - Profiler configuration:
[otel.javaagent 2021-09-28 18:17:04:238 +0000] [main] INFO <snip> - splunk.profiler.enabled : true
[otel.javaagent 2021-09-28 18:17:04:239 +0000] [main] INFO <snip> - splunk.profiler.directory : .
[otel.javaagent 2021-09-28 18:17:04:244 +0000] [main] INFO <snip> - splunk.profiler.recording.duration : 20s
[otel.javaagent 2021-09-28 18:17:04:244 +0000] [main] INFO <snip> - splunk.profiler.keep-files : false
[otel.javaagent 2021-09-28 18:17:04:245 +0000] [main] INFO <snip> - splunk.profiler.logs-endpoint : null
[otel.javaagent 2021-09-28 18:17:04:245 +0000] [main] INFO <snip> - otel.exporter.otlp.endpoint : http://collector:4317
[otel.javaagent 2021-09-28 18:17:04:245 +0000] [main] INFO <snip> - splunk.profiler.tlab.enabled : false
[otel.javaagent 2021-09-28 18:17:04:246 +0000] [main] INFO <snip> - splunk.profiler.period.jdk.threaddump : null
[otel.javaagent 2021-09-28 18:17:04:246 +0000] [main] INFO <snip> - -----------------------
JFR is not available error ๐
If your Java Virtual Machine does not support Java Flight Recording (JFR), the profiler logs a warning at startup with the message Java Flight Recorder (JFR) is not available in this JVM. Profiling is disabled.
.
To use the profiler, upgrade your JVM version to 8u262 and higher. See Java agent compatibility and requirements.
Access denied error ๐
If your Java runtime has Java Security Manager (JSM) activated, the following error might appear:
java.security.AccessControlException: access denied ("java.util.PropertyPermission" "otel.javaagent.debug" "read")
To fix this, deactivate JSM or add the following block to the JSM policy file:
grant codeBase "file:<path to splunk-otel-java.jar>" {
permission java.security.AllPermission;
};
AlwaysOn Profiling data and logs donโt appear in Observability Cloud ๐
Collector configuration issues might prevent AlwaysOn Profiling data and logs from appearing in Splunk Observability Cloud.
To solve this issue, do the following:
Find the value of
splunk.profiler.logs-endpoint
andotel.exporter.otlp.endpoint
in the startup log messages. Check that a collector is running using that endpoint and that the application host or container can resolve any host names and connect to the OTLP port.Make sure that youโre running the Splunk Distribution of OpenTelemetry Collector and that the version is 0.34 and higher. Other collector distributions might not be able to route the log data that contains profiling data.
A custom configuration might override settings that let the collector handle profiling data. Make sure to configure an
otlp
receiver and asplunk_hec
exporter with correct token and endpoint fields. Theprofiling
pipeline must use the OTLP receiver and Splunk HEC exporter youโve configured. See Splunk HEC exporter for more information.
The following snippet contains a sample profiling
pipeline:
receivers:
otlp:
protocols:
grpc:
exporters:
splunk_hec:
token: "${SFX_TOKEN}"
endpoint: "https://ingest.${SFX_REALM}.signalfx.com/v1/log"
logging/info:
loglevel: info
processors:
batch:
service:
pipelines:
profiling:
receivers: [otlp]
processors: [batch]
exporters: [logging/info, splunk_hec]
Loss of profiling data or gaps in profiling data ๐
If there are less than 100 megabytes of space available for the Java Virtual Machine, AlwaysOn Profiling activates the recording escape hatch, which appears in the logs as com.splunk.opentelemetry.profiler.RecordingEscapeHatch
. The escape hatch drops all logs with profiling data until more space is available.
To avoid the loss of profiling data due to the escape hatch, provide enough resources to the JVM.
Deactivate all Java agent logs ๐
By default, the Splunk Java agent outputs logs to the console. In certain situations you might want to silence the output of the agent so as not to clutter the system logs.
To run the Java agent in silent mode, add the following argument:
-Dio.opentelemetry.javaagent.slf4j.simpleLogger.defaultLogLevel=off
Report an issue ๐
Before you create an issue or open a support request, try gathering the following information:
What happened and the impact of the issue.
All the steps youโve followed until the issue appeared.
What was the expected outcome.
Your attempts to solve the issue, including workarounds.
The operating system, runtime or compiler version, libraries, frameworks, and application servers of your environment, including your instrumentation settings.
Debug logs and other logs that might help troubleshoot the issue.
To get help, see Splunk Observability Cloud support.