Splunk® Connect for Kafka

Install and Administer Splunk Connect for Kafka

This documentation does not apply to the most recent version of Splunk® Connect for Kafka. For documentation on the most recent version, go to the latest release.

Data ingestion parameters for Splunk Connect for Kafka

Use the following parameters to specify the types of data that you want to ingest into your Splunk platform deployment.

Required parameters

Parameter Name Description
name Connector name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes.
connector.class The Java class used to perform connector jobs. Keep the default value com.splunk.kafka.connect.SplunkSinkConnector unless you modify the connector.
tasks.max The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Splunk Connect for Kafka connector nodes.
splunk.hec.uri Splunk HTTP Event Collector (HEC) URIs. Either a list of Fully Qualified Domain Names (FQDNs) or IPs of all Splunk indexers, separated with a "," or a load balancer. The connector will load balance to indexers using round robin. Splunk Connector will round robin to this list of indexers. For example, <code>https://hec1.splunk.com:8088, https://hec2.splunk.com:8088, https://hec3.splunk.com:8088</code>.
splunk.hec.token Splunk HEC token.
topics or topics.regex For topics: A comma-separated list of Kafka topics for Splunk to consume. For example, prod-topic1,prod-topic2,prod-topic3.

For topics.regex: Use for declaring topic subscriptions as a name pattern, instead of specifying each topic in a list. For example, ^prod-topic[0-9]$

  • If topics.regex is specified, the topics parameter must be omitted.
  • With topics.regex, the Splunk software meta fields(splunk.indexes, splunk.sourcetypes, splunk.sources) are ignored and should be omitted.
  • With topics.regex the Splunk software metadata must either be defined on a per-event basis by using Kafka Header Fields(splunk.header.index, splunk.header.sourcetype, etc.), or the Splunk software can be defined by the HEC token default index and sourcetype values.

Header parameters

Parameter Name Description
splunk.header.support Header name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes. Parses Kafka headers for using metadata in generated Splunk software events. By default, this setting is set to false.

Requires Kafka Connect version 1.1 or later.

splunk.header.custom Header name. Applicable when splunk.header.support is set to true. Custom headers are configured separated by comma for multiple headers. For example, custom_header_1,custom_header_2,custom_header_3. This setting will look for Kafka record headers with these values and add them to each event if present. By default, it is set to "".
splunk.header.index Header name. Applicable when splunk.header.support is set to true. This setting specifies the header to be used for the Splunk platform index. By default, it is set to splunk.header.index.
splunk.header.source Header name. Applicable when splunk.header.support is set to true. This setting specifies the source to be used for the Splunk platform source. By default, it is set to splunk.header.source.
splunk.header.sourcetype Header name. Applicable when splunk.header.support is set to true. This setting specifies the sourcetype to be used for the Splunk software sourcetype. By default, it is set to splunk.header.sourcetype.
splunk.header.host Header name. Applicable when splunk.header.support is set to true. This setting specifies the host to be used for the Splunk software host. By default, it is set to splunk.header.host.

Optional parameters

Parameter Name Description
kerberos.user.principle The Kerberos user principal connector can be used to authenticate with Kerberos.
kerberos.keytab.path The path to the keytab file is used for authentication with Kerberos.
splunk.indexes Target Splunk indexes to send data to. This can be a list of indexes can be a list of indexes, and can also be the same sequence and order as topics.

It is possible to inject data from different Kafka topics to different Splunk platform indexes. For example, prod-topic1, prod-topic2, and prod-topic3 can be sent to index prod-index1, prod-index2, and prod-index3.

If you want to index all data from multiple topics to the main index, then "main" can be specified. If you leave this setting unconfigured, data will route to the default index configured against the HEC token. Verify that the indexes configured here are in the index list of HEC tokens, otherwise Splunk HEC will drop the data. By default, this setting is empty.
splunk.sources Splunk event source metadata for the Kafka topic data. The same configuration rules as indexes can be applied. If left unconfigured, the default source binds to the HEC token. By default, this setting is empty.
splunk.sourcetypes Splunk event source metadata for the Kafka topic data. The same configuration rules as indexes can be applied here. If left unconfigured, the default source binds to the HEC token. By default, this setting is empty.
splunk.hec.backoff.threshhold.seconds The amount of time Splunk Connect for Kafka waits to attempt resending after errors from a HEC endpoint.
splunk.flush.window The interval, in seconds, at which the events from Kafka connect will be flushed to your Splunk platform instance. By default, this is set to 30.
splunk.hec.ssl.validate.certs Valid settings are true or false, and they enable or disable HTTPS certification validation. By default, this is set to true.
splunk.hec.http.keepalive Valid settings are true or false, and they enable or disable HTTPS connection keep-alive. By default, this is set to true.
splunk.hec.max.http.connection.per.channel Controls how many HTTP connections will be created and cached in the HTTP pool for one HEC channel. By default, this is set to 2.
splunk.hec.max.outstanding.events Maximum amount of un-acknowledged events kept in memory by connector. Will trigger back-pressure event to slow down collection if reached.
splunk.hec.max.retries The amount of times a failed batch will attempt to resend before dropping events completely. Dropping events will result in data loss. Default is -1, which will retry indefinitely.
splunk.hec.lb.poll.interval Specify this parameter(in seconds) to control the polling interval (increase to do less polling, decrease to do more frequent polling). Default is 120.
splunk.hec.enable.compression Used for enable or disable gzip-compression. Valid settings are true or false. Default is false.
splunk.hec.total.channels Controls the total channels created to perform HEC event POSTs. By default, this is set to 2.
splunk.hec.max.batch.size Maximum batch size when posting events to Splunk. The size is the actual number of Kafka events, and not byte size. By default, this is set to 500.
splunk.hec.threads Controls how many threads are spawned to do data injection via HEC in a single connector task. By default, this is set to 1.
splunk.hec.socket.timeout Internal TCP socket timeout when connecting to Splunk. By default, this is set to 60 seconds.
splunk.hec.json.event.formatted Set to true for events that are already in HEC format. Valid settings are true or false.
splunk.hec.ssl.trust.store.path Location of Java KeyStore. Default setting is "".
splunk.hec.ssl.trust.store.password Password for Java Keystore. Default setting is "".

Acknowledgment parameters (optional)

Enable HTTP Event Collector (HEC) token acknowledgments to avoid data loss. Without HEC token acknowledgment, data loss may occur, especially in the case of a system restart or crash.

Parameter Name Description
splunk.hec.ack.enabled Valid settings are true or false. When set to true the Splunk Connect for Kafka connector will poll event acknowledgments (ACKs) for POST events before check-pointing the Kafka offsets. This is used to prevent data loss, as this setting implements guaranteed delivery. By default, this setting is set to true.

If this setting is set to true, verify that the corresponding HEC token is also enabled with index acknowledgments, otherwise the data injection will fail, due to duplicate data. When set to false, the Splunk Connect for Kafka connector will only POST events to your Splunk platform instance. After it receives an HTTP 200 OK response, it assumes the events are indexed by Splunk. In cases where the Splunk platform crashes, there may be data loss.

splunk.hec.ack.poll.interval This setting is only applicable when splunk.hec.ack.enabled is set to true. Internally it controls the event ACKs polling interval. By default, this setting is set to 10 seconds.
splunk.hec.ack.poll.threads This setting is used for performance tuning and is only applicable when splunk.hec.ack.enabled is set to true. It controls how many threads should be spawned to poll event ACKs. By default, this is set to 1.

For large Splunk indexer clusters (for example, 100 indexers) increase this number. Speed up ACK polling by increasing to 4 threads.

splunk.hec.event.timeout This setting is applicable when splunk.hec.ack.enabled is set to true. This setting determines how long the connector will wait before timing out and resending when events are POSTed to Splunk and before they are ACKed. By default, this setting is set to 300 seconds.

Endpoint parameters (Optional)

Parameter Name Description
splunk.hec.raw Set to true for Splunk software to ingest data using the HEC /raw endpoint. Default is false, which will use the /event endpoint.
splunk.hec.raw.line.breaker Only applicable to HEC /raw endpoint. The setting is used to specify a custom line breaker to help Splunk separate the events correctly.

For example, you can specify "#####" as a special line breaker. Internally, Splunk Connect for Kafka will append this line breaker to every Kafka record to form a clear event boundary. The connector performs data injection in batch mode. On the Splunk platform, configure your HEC's props.conf file to set up line breaker for the source types. Then the Splunk software will break events for data flowing through the HEC /raw endpoint. By default, this setting is empty.

For more on the HTTP Event Collector (HEC) see Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual.

splunk.hec.json.event.enrichment Only applicable to the HEC /event endpoint. This setting is used to enrich raw data with extra indexed metadata fields. It contains a list of key value pairs separated by ",". The configured enrichment metadata will be indexed along with raw event data by Splunk software. Data enrichment for the HEC /event endpoint is only available in Splunk Enterprise 6.5 and later. By default, this setting is empty.
splunk.hec.track.data Valid settings are true or false. When set to true, data loss and data injection latency metadata will be indexed along with raw data. This setting only works in conjunction with the HEC /event endpoint (splunk.hec.raw : false). By default, this setting is set to false.
Last modified on 12 July, 2022
Hardware and software requirements for Splunk Connect for Kafka   Install Splunk Connect for Kafka

This documentation applies to the following versions of Splunk® Connect for Kafka: 2.0.5, 2.0.6, 2.0.7


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters