Splunk® Connect for Kafka

Install and Administer Splunk Connect for Kafka

This documentation does not apply to the most recent version of Splunk® Connect for Kafka. For documentation on the most recent version, go to the latest release.

Parameters

Use the following parameters to specify the types of data that you want to ingest into your Splunk platform deployment.

Required parameters

Parameter Name Description
name Connector name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes.
connector.class The Java class used to perform connector jobs. Keep the default value com.splunk.kafka.connect.SplunkSinkConnector unless you modify the connector.
tasks.max The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Splunk Connect for Kafka connector nodes.
splunk.hec.uri Splunk HTTP Event Collector (HEC) URIs. Either a list of Fully Qualified Domain Names (FQDNs) or IPs of all Splunk indexers, separated with a "," or a load balancer. The connector will load balance to indexers using round robin. Splunk Connector will round robin to this list of indexers. For example, <code>https://hec1.splunk.com:8088, https://hec2.splunk.com:8088, https://hec3.splunk.com:8088</code>.
splunk.hec.token Splunk HEC token.
topics Comma-separated list of Kafka topics for Splunk to consume. For example, prod-topic1,prod-topic2,prod-topic3.

Optional parameters

Parameter Name Description
splunk.indexes Target Splunk indexes to send data to. This can be a list of indexes can be a list of indexes, and can also be the same sequence and order as topics.

It is possible to inject data from different Kafka topics to different Splunk platform indexes. For example, prod-topic1, prod-topic2, and prod-topic3 can be sent to index prod-index1, prod-index2, and prod-index3.

If you want to index all data from multiple topics to the main index, then "main" can be specified. If you leave this setting unconfigured, data will route to the default index configured against the HEC token. Verify that the indexes configured here are in the index list of HEC tokens, otherwise Splunk HEC will drop the data. By default, this setting is empty.
splunk.sources Splunk event source metadata for the Kafka topic data. The same configuration rules as indexes can be applied. If left unconfigured, the default source binds to the HEC token. By default, this setting is empty.
splunk.sourcetypes Splunk event source metadata for the Kafka topic data. The same configuration rules as indexes can be applied here. If left unconfigured, the default source binds to the HEC token. By default, this setting is empty.
splunk.hec.ssl.validate.certs Valid settings are true or false, and they enable or disable HTTPS certification validation. By default, this is set to true.
splunk.hec.http.keepalive Valid settings are true or false, and they enable or disable HTTPS connection keep-alive. By default, this is set to true.
splunk.hec.max.http.connection.per.channel Controls how many HTTP connections will be created and cached in the HTTP pool for one HEC channel. By default, this is set to 2.
splunk.hec.total.channels Controls the total channels created to perform HEC event POSTs. By default, this is set to 2.
splunk.hec.max.batch.size Maximum batch size when posting events to Splunk. The size is the actual number of Kafka events, and not byte size. By default, this is set to 100.
splunk.hec.threads Controls how many threads are spawned to do data injection via HEC in a single connector task. By default, this is set to 1.
splunk.hec.socket.timeout Internal TCP socket timeout when connecting to Splunk. By default, this is set to 60 seconds.

Acknowledgment parameters (optional)

Enable HTTP Event Collector (HEC) token acknowledgments to avoid data loss. Without HEC token acknowledgment, data loss may occur, especially in the case of a system restart or crash.

Parameter Name Description
splunk.hec.ack.enabled Valid settings are true or false. When set to true the Splunk Connect for Kafka connector will poll event acknowledgments (ACKs) for POST events before check-pointing the Kafka offsets. This is used to prevent data loss, as this setting implements guaranteed delivery. By default, this setting is set to true.

If this setting is set to true, verify that the corresponding HEC token is also enabled with index acknowledgments, otherwise the data injection will fail, due to duplicate data. When set to false, the Splunk Connect for Kafka connector will only POST events to your Splunk platform instance. After it receives an HTTP 200 OK response, it assumes the events are indexed by Splunk. In cases where the Splunk platform crashes, there may be data loss.

splunk.hec.ack.poll.interval This setting is only applicable when splunk.hec.ack.enabled is set to true. Internally it controls the event ACKs polling interval. By default, this setting is set to 10 seconds.
splunk.hec.ack.poll.threads This setting is used for performance tuning and is only applicable when splunk.hec.ack.enabled is set to true. It controls how many threads should be spawned to poll event ACKs. By default, this is set to 1.

For large Splunk indexer clusters (for example, 100 indexers) increase this number. Speed up ACK polling by increasing to 4 threads.

splunk.hec.event.timeout This setting is applicable when splunk.hec.ack.enabled is set to true. This setting determines how long the connector will wait before timing out and resending when events are POSTed to Splunk and before they are ACKed. By default, this setting is set to 300 seconds.

Endpoint parameters (Optional)

Parameter Name Description
splunk.hec.raw Set to true for Splunk software to ingest data using the HEC /raw endpoint. Default is false, which will use the /event endpoint.
splunk.hec.raw.line.breaker Only applicable to HEC /raw endpoint. The setting is used to specify a custom line breaker to help Splunk separate the events correctly.

For example, you can specify "#####" as a special line breaker. Internally, Splunk Connect for Kafka will append this line breaker to every Kafka record to form a clear event boundary. The connector performs data injection in batch mode. On the Splunk platform, configure props.conf to set up line breaker for the source types. Then the Splunk software will break events for data flowing through the HEC /raw endpoint. By default, this setting is empty.

splunk.hec.json.event.enrichment Only applicable to the HEC /event endpoint. This setting is used to enrich raw data with extra metadata fields. It contains a list of key value pairs separated by ",". The configured enrichment metadata will be indexed along with raw event data by Splunk software. Data enrichment for the HEC /event endpoint is only available in Splunk Enterprise 6.5 and later. By default, this setting is empty.
splunk.hec.track.data Valid settings are true or false. When set to true, data loss and data injection latency metadata will be indexed along with raw data. This setting only works in conjunction with the HEC /event endpoint (splunk.hec.raw : false). By default, this setting is set to false.
Last modified on 26 June, 2018
Configuration examples   Security configurations

This documentation applies to the following versions of Splunk® Connect for Kafka: 1.0.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters