Troubleshoot issues with Splunk Connect for Kafka

Use the below examples to diagnose troubleshooting issues with Splunk Connect for Kafka.

No events are arriving in Splunk

If no events arrive in your Splunk platform deployment, perform the following steps:

Navigate to your HTTP Event Collector (HEC) token configurations.
Verify the indexer acknowledgment configurations used in the REST call (splunk.hec.ack.enabled) match the configurations defined for the target HEC token.
Make sure that a valid value for the index is provided in the REST call (splunk.indexes) if there is no default index connected to the target HEC token.
Save any changes.

Enable verbose logging

If you need to enable more verbose logging for Splunk Connect for Kafka, perform the following steps:

On your Kafka deployment, navigate to the config/connect-log4j.properties file.
Append the log4j.logger.com.splunk line to log4j.logger.com.splunk=DEBUG.
Save your changes.

Can't see any connector information on third party UI

If Splunk Connect for Kafka is not showing on Confluent Control Center, perform the following steps:

Enable cross-origin access for Kafka Connect.
Depending on your deployment, navigate to connect-distributed.properties or connect-distributed-quickstart.properties.

Append the following two lines to connect the configuration:

access.control.allow.origin=*
access.control.allow.methods=GET,OPTIONS,HEAD,POST,PUT,DELETE

Restart Kafka Connect.

Malformed data

If the raw data of the Kafka records is a JSON object but it is not marshaled, or if the raw data is in bytes, but is not UTF-8 encodable, Splunk Connect for Kafka considers these records malformed. It logs the exception with Kafka-specific information for these records within the console, and the malformed records are indexed in Splunk. You can search "type=malformed" within your Splunk platform deployment to return any malformed Kafka records.

Performance decline over time

If events are processed at a normal rate, but after approximately 10 minutes or more, the rate suddenly drops, leading to a decrease in performance. Check the logs to see if the tasks are re-balanced and the following error populates.

ERROR WorkerSinkTask{id=testtest-1} Commit of offsets threw an unexpected exception for sequence number 50: {sharon-test-2-0=OffsetAndMetadata{offset=436090, metadata=''}, username-test-2-8=OffsetAndMetadata{offset=436280, metadata=''}, username-test-2-7=OffsetAndMetadata{offset=435398, metadata=''}, username-test-2-6=OffsetAndMetadata{offset=436119, metadata=''}, username-test-2-5=OffsetAndMetadata{offset=435440, metadata=''}, username-test-2-4=OffsetAndMetadata{offset=436940, metadata=''}, username-test-2-3=OffsetAndMetadata{offset=435703, metadata=''}, username-test-2-2=OffsetAndMetadata{offset=436149, metadata=''}, username-test-2-1=OffsetAndMetadata{offset=435978, metadata=''}} (org.apache.kafka.connect.runtime.WorkerSinkTask:215)
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.

This happens when event batches time out and Kafka identifies that task as down and triggers a re-balance on the remaining tasks. Meanwhile, Kafka connect is committing the acknowledgments that it receives from your Splunk platform deployment. When the acknowledgments are committed and got back from Kafka, Splunk Connect for Kafka is already rebalanced.

You can address this issue by:

Increasing the session timeout for HEC events.
Reducing the maximum size of batches returned in poll() with max.poll.records.
Checking your Splunk platform license. When your daily license volume overfills, your deployment's indexing speed slows.

Duplicate data at the start of data collection

If you see duplicate data at the start of data collection in a new connector deployment when no offsets are saved, or if you have added data to a topic and started a connector with a number of tasks that are greater than the number of partitions, ensure the number of tasks configured does not exceed the number of partitions.

Acknowledgments are unsuccessful

If Splunk Connect for Kafka polls the acknowledgments for the last few batches in your logs, but never successfully polls the acknowledgements, you might see the following error:

[2018-01-26 20:24:25,799] DEBUG ackPollResponse={"acks":{"0":false}} (com.splunk.hecclient.HecAckPoller:249)
[2018-01-26 20:24:25,799] DEBUG ackPollResponse={"acks":{"0":false}} (com.splunk.hecclient.HecAckPoller:249)
[2018-01-26 20:24:25,799] INFO no ackIds are ready for channel=a86c1588-677c-44e6-b275-0ea36120e275 on indexer=https://ec2-54-183-92-156.us-west-1.compute.amazonaws.com:8088 (com.splunk.hecclient.HecAckPoller:263)
[2018-01-26 20:24:25,799] INFO no ackIds are ready for channel=5799af36-f287-4564-9f55-6ef5bc37618c on indexer=https://ec2-54-183-92-156.us-west-1.compute.amazonaws.com:8088 (com.splunk.hecclient.HecAckPoller:263)
[2018-01-26 20:24:25,808] INFO start polling 2 outstanding acks for 2 channels (com.splunk.hecclient.HecAckPoller:188)
[2018-01-26 20:24:25,808] WARN timed out event batch after 60 seconds not acked (com.splunk.hecclient.EventBatch:66)
[2018-01-26 20:24:25,808] WARN timed out event batch after 60 seconds not acked (com.splunk.hecclient.EventBatch:66)
[2018-01-26 20:24:25,808] WARN detected 2 event batches timedout (com.splunk.hecclient.HecAckPoller:208)

To fix this acknowledgment issue, increase the splunk.hec.event.timeout.

This might happen for the last few event batches in Kafka. The Splunk platform buffers events into batches to index. The last few batches of events may not fill the buffer on the Splunk platform, so the events stay in the buffer until they time out. If Splunk Connect for Kafka is configured to have a HEC event timeout smaller than two minutes, Splunk Connect for Kafka will time out the events before the events are indexed.

Splunk Connect for Kafka tasks fail due to serialization error

If you encounter a serialization error, update your worker properties (connect-distributed.properties) file to make sure the following settings are correctly configured:

key.converter=<org.apache.kafka.connect.storage.StringConverter|org.apache.kafka.connect.json.JsonConverter|io.confluent.connect.avro.AvroConverter>
value.converter=<org.apache.kafka.connect.storage.StringConverter|org.apache.kafka.connect.json.JsonConverter|io.confluent.connect.avro.AvroConverter>

For StringConverter and JsonConverter only:
key.converter.schemas.enable=false
value.converter.schemas.enable=false

For AvroConverter only:
key.converter.schema.registry.url=<Location of Avro schema registry>
value.converter.schema.registry.url=<Location of Avro schema registry>

The error may look like this:

org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
  at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:304)
  at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:425)
  at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:264)
  at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182)
  at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150)
  at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
  at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)

See the Installation section of this manual to learn more.

Error: I/O exception

If you encounter an I/O exception error, test it against one of the following solutions.

If this happens intermittently, the HEC is too busy to process the requests.
If you see this error repeatedly, lower the rate of post data. Increase the event batch size to lower the number of requests.
If no events can be delivered to Splunk, check the connection between Splunk Connect for Kafka and your Splunk HEC endpoint.

The error may look like this:

ERROR encountered io exception (com.splunk.hecclient.Indexer:141)
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:204)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
        at sun.security.ssl.InputRecord.read(InputRecord.java:503)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
        at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
        at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at com.splunk.hecclient.Indexer.executeHttpRequest(Indexer.java:138)
        at com.splunk.hecclient.HecChannel.executeHttpRequest(HecChannel.java:60)
        at com.splunk.hecclient.HecAckPoller$RunAckQuery.run(HecAckPoller.java:228)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Error: Conflicting operation

If you encounter a conflicting operation error when creating a Splunk Connect for Kafka task using the REST API, there might be another Splunk Connect for Kafka instance running on your deployment and the two are not in sync. You may need to stop one of the running instances.

The error may look like this:

"error_code":409,"message":"Cannot complete request because of a conflicting operation (e.g. worker rebalance)"

Error: Out of memory

If you encounter an "out of memory" error, review the current JVM memory allocated to Kafka Connect by checking the value of environment variable KAFKA_HEAP_OPTS.

Depending on your physical resource, you can increase the memory by updating the environment variable (for example, -Xmx16G -Xms2G) and restarting Kafka Connect.

Error: Workers require a list of topics

If you encounter an error that says SinkTasks require a list of topics, the Kafka topic name was not provided. Provide the names of the topics as part of the worker configuration.

The error may look like this:

ERROR Task kafka-connect-splunk-20m-ack2-1-1 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:148)
org.apache.kafka.connect.errors.ConnectException: Sink tasks require a list of topics.

Error: Invalid token

If you encounter an error that says that your deployment has an invalid Splunk HEC token, provide the correct HEC token.

The error may look like this:

ERROR failed to post events resp={"text":"Invalid token","code":4}, status=403 (com.splunk.hecclient.Indexer:172)

Error: Connection timed out

If you encounter a time out connection error, the Splunk platform is not reachable. Verify the provided HEC URI is up, running, and reachable.

The error may look like this:

ERROR encountered io exception (com.splunk.hecclient.Indexer:141)
org.apache.http.conn.HttpHostConnectException: Connect to x.x.x.x:8088 [/x.x.x.x] failed: Connection timed out (Connection timed out)

Error: Invalid enrichment

If you encounter an error indicating an invalid enrichment, your deployment has an invalid, non key-value pair data enrichment parameter. Provide values in key-value format only.

The error may look like this:

ERROR Task kafka-connect-splunk-jan26-ack-2-1 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:148)
org.apache.kafka.common.config.ConfigException: Invalid enrichment: lucky. Expect key value pairs and separated by comma.

Error: Unrecognized SSL message

If you encounter an error indicating an unrecognized SSL message, your deployment's HEC URI contains HTTP instead of HTTPS. Either enable SSL on your Splunk HEC, or use HTTP in the HEC URI. If possible, do not disable SSL.

The error may look like this:

ERROR encountered io exception (com.splunk.hecclient.Indexer:141)
javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

Error: Unable to find valid certification path

If you encounter an error indicating an inability to find a valid certification path, the SSL certificate has not been provided. When SSL certificate validation setting is set to true, please a provide valid SSL certificate path.

The error may look like this:

ERROR encountered io exception (com.splunk.hecclient.Indexer:141)
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Error: ACK is disabled

If you encounter an error that indicates that acknowledgment has been disabled, indexer acknowledgement has been disabled on your Splunk software HEC. Enable indexer acknowledgment on the Splunk platform side of your deployment, or use splunk.hec.ack.enabled=false in your Kafka Connect configurations.

The error may look like this:

ERROR failed to poll ack for channel=6f31075c-8b71-46e7-9324-a0a2016249e9 on indexer=https://x.x.x.x:8088 (com.splunk.hecclient.HecAckPoller:232)
com.splunk.hecclient.HecException: failed to post events resp={"text":"ACK is disabled","code":14}, status=400

Related answers from Splunk Community

Troubleshoot issues with Splunk Connect for Kafka

No events are arriving in Splunk

Enable verbose logging

Can't see any connector information on third party UI

Malformed data

Performance decline over time

Duplicate data at the start of data collection

Acknowledgments are unsuccessful

Splunk Connect for Kafka tasks fail due to serialization error

Error: I/O exception

Error: Conflicting operation

Error: Out of memory

Error: Workers require a list of topics

Error: Invalid token

Error: Connection timed out

Error: Invalid enrichment

Error: Unrecognized SSL message

Error: Unable to find valid certification path

Error: ACK is disabled

Comments

Troubleshoot issues with Splunk Connect for Kafka

Was this topic useful?