Troubleshoot issues with Splunk Connect for Kafka
Use the below examples to diagnose troubleshooting issues with Splunk Connect for Kafka.
No events are arriving in Splunk
If no events arrive in your Splunk platform deployment, perform the following steps:
- Navigate to your HTTP Event Collector (HEC) token configurations.
- Verify the indexer acknowledgment configurations used in the REST call (
splunk.hec.ack.enabled
) match the configurations defined for the target HEC token. - Make sure that a valid value for the index is provided in the REST call (
splunk.indexes
) if there is no default index connected to the target HEC token. - Save any changes.
Enable verbose logging
If you need to enable more verbose logging for Splunk Connect for Kafka, perform the following steps:
- On your Kafka deployment, navigate to the
config/connect-log4j.properties
file. - Append the
log4j.logger.com.splunk
line tolog4j.logger.com.splunk=DEBUG
. - Save your changes.
Can't see any connector information on third party UI
If Splunk Connect for Kafka is not showing on Confluent Control Center, perform the following steps:
- Enable cross-origin access for Kafka Connect.
- Depending on your deployment, navigate to
connect-distributed.properties
orconnect-distributed-quickstart.properties
. - Append the following two lines to connect the configuration:
access.control.allow.origin=* access.control.allow.methods=GET,OPTIONS,HEAD,POST,PUT,DELETE
- Restart Kafka Connect.
Malformed data
If the raw data of the Kafka records is a JSON object but it is not marshaled, or if the raw data is in bytes, but is not UTF-8 encodable, Splunk Connect for Kafka considers these records malformed. It logs the exception with Kafka-specific information for these records within the console, and the malformed records are indexed in Splunk. You can search "type=malformed"
within your Splunk platform deployment to return any malformed Kafka records.
Performance decline over time
If events are processed at a normal rate, but after approximately 10 minutes or more, the rate suddenly drops, leading to a decrease in performance. Check the logs to see if the tasks are re-balanced and the following error populates.
ERROR WorkerSinkTask{id=testtest-1} Commit of offsets threw an unexpected exception for sequence number 50: {sharon-test-2-0=OffsetAndMetadata{offset=436090, metadata=''}, username-test-2-8=OffsetAndMetadata{offset=436280, metadata=''}, username-test-2-7=OffsetAndMetadata{offset=435398, metadata=''}, username-test-2-6=OffsetAndMetadata{offset=436119, metadata=''}, username-test-2-5=OffsetAndMetadata{offset=435440, metadata=''}, username-test-2-4=OffsetAndMetadata{offset=436940, metadata=''}, username-test-2-3=OffsetAndMetadata{offset=435703, metadata=''}, username-test-2-2=OffsetAndMetadata{offset=436149, metadata=''}, username-test-2-1=OffsetAndMetadata{offset=435978, metadata=''}} (org.apache.kafka.connect.runtime.WorkerSinkTask:215) org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This happens when event batches time out and Kafka identifies that task as down and triggers a re-balance on the remaining tasks. Meanwhile, Kafka connect is committing the acknowledgments that it receives from your Splunk platform deployment. When the acknowledgments are committed and got back from Kafka, Splunk Connect for Kafka is already rebalanced.
You can address this issue by:
- Increasing the session timeout for HEC events.
- Reducing the maximum size of batches returned in
poll()
withmax.poll.records
. - Checking your Splunk platform license. When your daily license volume overfills, your deployment's indexing speed slows.
Duplicate data at the start of data collection
If you see duplicate data at the start of data collection in a new connector deployment when no offsets are saved, or if you have added data to a topic and started a connector with a number of tasks that are greater than the number of partitions, ensure the number of tasks configured does not exceed the number of partitions.
Acknowledgments are unsuccessful
If Splunk Connect for Kafka polls the acknowledgments for the last few batches in your logs, but never successfully polls the acknowledgements, you might see the following error:
[2018-01-26 20:24:25,799] DEBUG ackPollResponse={"acks":{"0":false}} (com.splunk.hecclient.HecAckPoller:249) [2018-01-26 20:24:25,799] DEBUG ackPollResponse={"acks":{"0":false}} (com.splunk.hecclient.HecAckPoller:249) [2018-01-26 20:24:25,799] INFO no ackIds are ready for channel=a86c1588-677c-44e6-b275-0ea36120e275 on indexer=https://ec2-54-183-92-156.us-west-1.compute.amazonaws.com:8088 (com.splunk.hecclient.HecAckPoller:263) [2018-01-26 20:24:25,799] INFO no ackIds are ready for channel=5799af36-f287-4564-9f55-6ef5bc37618c on indexer=https://ec2-54-183-92-156.us-west-1.compute.amazonaws.com:8088 (com.splunk.hecclient.HecAckPoller:263) [2018-01-26 20:24:25,808] INFO start polling 2 outstanding acks for 2 channels (com.splunk.hecclient.HecAckPoller:188) [2018-01-26 20:24:25,808] WARN timed out event batch after 60 seconds not acked (com.splunk.hecclient.EventBatch:66) [2018-01-26 20:24:25,808] WARN timed out event batch after 60 seconds not acked (com.splunk.hecclient.EventBatch:66) [2018-01-26 20:24:25,808] WARN detected 2 event batches timedout (com.splunk.hecclient.HecAckPoller:208)
To fix this acknowledgment issue, increase the splunk.hec.event.timeout
.
This might happen for the last few event batches in Kafka. The Splunk platform buffers events into batches to index. The last few batches of events may not fill the buffer on the Splunk platform, so the events stay in the buffer until they time out. If Splunk Connect for Kafka is configured to have a HEC event timeout smaller than two minutes, Splunk Connect for Kafka will time out the events before the events are indexed.
Splunk Connect for Kafka tasks fail due to serialization error
If you encounter a serialization error, update your worker properties (connect-distributed.properties
) file to make sure the following settings are correctly configured:
key.converter=<org.apache.kafka.connect.storage.StringConverter|org.apache.kafka.connect.json.JsonConverter|io.confluent.connect.avro.AvroConverter> value.converter=<org.apache.kafka.connect.storage.StringConverter|org.apache.kafka.connect.json.JsonConverter|io.confluent.connect.avro.AvroConverter> For StringConverter and JsonConverter only: key.converter.schemas.enable=false value.converter.schemas.enable=false For AvroConverter only: key.converter.schema.registry.url=<Location of Avro schema registry> value.converter.schema.registry.url=<Location of Avro schema registry>
The error may look like this:
org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error: at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:304) at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:425) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:264) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
See the Installation section of this manual to learn more.
Error: I/O exception
If you encounter an I/O exception error, test it against one of the following solutions.
- If this happens intermittently, the HEC is too busy to process the requests.
- If you see this error repeatedly, lower the rate of post data. Increase the event batch size to lower the number of requests.
- If no events can be delivered to Splunk, check the connection between Splunk Connect for Kafka and your Splunk HEC endpoint.
The error may look like this:
ERROR encountered io exception (com.splunk.hecclient.Indexer:141) java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:204) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at com.splunk.hecclient.Indexer.executeHttpRequest(Indexer.java:138) at com.splunk.hecclient.HecChannel.executeHttpRequest(HecChannel.java:60) at com.splunk.hecclient.HecAckPoller$RunAckQuery.run(HecAckPoller.java:228) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Error: Conflicting operation
If you encounter a conflicting operation error when creating a Splunk Connect for Kafka task using the REST API, there might be another Splunk Connect for Kafka instance running on your deployment and the two are not in sync. You may need to stop one of the running instances.
The error may look like this:
"error_code":409,"message":"Cannot complete request because of a conflicting operation (e.g. worker rebalance)"
Error: Out of memory
If you encounter an "out of memory" error, review the current JVM memory allocated to Kafka Connect by checking the value of environment variable KAFKA_HEAP_OPTS
.
Depending on your physical resource, you can increase the memory by updating the environment variable (for example, -Xmx16G
-Xms2G
) and restarting Kafka Connect.
Error: Workers require a list of topics
If you encounter an error that says SinkTasks require a list of topics, the Kafka topic name was not provided. Provide the names of the topics as part of the worker configuration.
The error may look like this:
ERROR Task kafka-connect-splunk-20m-ack2-1-1 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:148) org.apache.kafka.connect.errors.ConnectException: Sink tasks require a list of topics.
Error: Invalid token
If you encounter an error that says that your deployment has an invalid Splunk HEC token, provide the correct HEC token.
The error may look like this:
ERROR failed to post events resp={"text":"Invalid token","code":4}, status=403 (com.splunk.hecclient.Indexer:172)
Error: Connection timed out
If you encounter a time out connection error, the Splunk platform is not reachable. Verify the provided HEC URI is up, running, and reachable.
The error may look like this:
ERROR encountered io exception (com.splunk.hecclient.Indexer:141) org.apache.http.conn.HttpHostConnectException: Connect to x.x.x.x:8088 [/x.x.x.x] failed: Connection timed out (Connection timed out)
Error: Invalid enrichment
If you encounter an error indicating an invalid enrichment, your deployment has an invalid, non key-value pair data enrichment parameter. Provide values in key-value format only.
The error may look like this:
ERROR Task kafka-connect-splunk-jan26-ack-2-1 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:148) org.apache.kafka.common.config.ConfigException: Invalid enrichment: lucky. Expect key value pairs and separated by comma.
Error: Unrecognized SSL message
If you encounter an error indicating an unrecognized SSL message, your deployment's HEC URI contains HTTP instead of HTTPS. Either enable SSL on your Splunk HEC, or use HTTP in the HEC URI. If possible, do not disable SSL.
The error may look like this:
ERROR encountered io exception (com.splunk.hecclient.Indexer:141) javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
Error: Unable to find valid certification path
Error in an SSL-enabled environment
When an the SSL certificate has not been provided, and you encounter the following error:
ERROR encountered io exception (com.splunk.hecclient.Indexer:141) javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
This could indicate an inability to find a valid certification path. When SSL certificate validation setting is set to true, provide a valid SSL certificate path.
Error in an non SSL-enabled environment
When splunk.hec.ssl.validate.certs
is set to true
, and you encounter the following error:
"trace": "org.apache.kafka.common.config.ConfigException: Invalid Secure HTTP (HTTPS) configuration: splunk.hec.uri='https://splunk.acme.com:8088', splunk.hec.ssl.validate.certs='true', splunk.hec.ssl.trust.store.path=''
This could indicate an inability to find a valid certification path in a non SSL certificate-enabled environment. Set the splunk.hec.ssl.validate.certs
to false to resolve the issue.
Error: ACK is disabled
If you encounter an error that indicates that acknowledgment has been disabled, indexer acknowledgement has been disabled on your Splunk software HEC. Enable indexer acknowledgment on the Splunk platform side of your deployment, or use splunk.hec.ack.enabled=false
in your Kafka Connect configurations.
The error may look like this:
ERROR failed to poll ack for channel=6f31075c-8b71-46e7-9324-a0a2016249e9 on indexer=https://x.x.x.x:8088 (com.splunk.hecclient.HecAckPoller:232) com.splunk.hecclient.HecException: failed to post events resp={"text":"ACK is disabled","code":14}, status=400
Configuration examples for Splunk Connect for Kafka | Release Notes for Splunk Connect for Kafka |
This documentation applies to the following versions of Splunk® Connect for Kafka: 2.0.1, 2.0.2
Feedback submitted, thanks!