Data ingestion parameters for Splunk Connect for Kafka

Use the following parameters to specify the types of data that you want to ingest into your Splunk platform deployment.

Required parameters

Parameter Name	Description
`name`	Connector name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes.
`connector.class`	The Java class used to perform connector jobs. Keep the default value `com.splunk.kafka.connect.SplunkSinkConnector` unless you modify the connector.
`tasks.max`	The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Splunk Connect for Kafka connector nodes.
`splunk.hec.uri`	Splunk HTTP Event Collector (HEC) URIs. Either a list of Fully Qualified Domain Names (FQDNs) or IPs of all Splunk indexers, separated with a "," or a load balancer. The connector will load balance to indexers using round robin. Splunk Connector will round robin to this list of indexers. For example, <code>https://hec1.splunk.com:8088, https://hec2.splunk.com:8088, https://hec3.splunk.com:8088</code>.
`splunk.hec.token`	Splunk HEC token.
`topics` or `topics.regex`	For `topics`: A comma-separated list of Kafka topics for Splunk to consume. For example, `prod-topic1,prod-topic2,prod-topic3`. For `topics.regex`: Use for declaring topic subscriptions as a name pattern, instead of specifying each topic in a list. For example, `^prod-topic[0-9]$` If `topics.regex` is specified, the `topics` parameter must be omitted. With `topics.regex`, the Splunk software meta fields(`splunk.indexes`, `splunk.sourcetypes`, `splunk.sources`) are ignored and should be omitted. With `topics.regex` the Splunk software metadata must either be defined on a per-event basis by using Kafka Header Fields(`splunk.header.index`, `splunk.header.sourcetype`, etc.), or the Splunk software can be defined by the HEC token default index and sourcetype values.

Header parameters

Parameter Name	Description
`splunk.header.support`	Header name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes. Parses Kafka headers for using metadata in generated Splunk software events. By default, this setting is set to `false`. Requires Kafka Connect version 1.1 or later.
`splunk.header.custom`	Header name. Applicable when `splunk.header.support` is set to `true`. Custom headers are configured separated by comma for multiple headers. For example, `custom_header_1,custom_header_2,custom_header_3`. This setting will look for Kafka record headers with these values and add them to each event if present. By default, it is set to `""`.
`splunk.header.index`	Header name. Applicable when `splunk.header.support` is set to `true`. This setting specifies the header to be used for the Splunk platform index. By default, it is set to `splunk.header.index`.
`splunk.header.source`	Header name. Applicable when `splunk.header.support` is set to `true`. This setting specifies the source to be used for the Splunk platform source. By default, it is set to `splunk.header.source`.
`splunk.header.sourcetype`	Header name. Applicable when `splunk.header.support` is set to `true`. This setting specifies the sourcetype to be used for the Splunk software sourcetype. By default, it is set to `splunk.header.sourcetype`.
`splunk.header.host`	Header name. Applicable when `splunk.header.support` is set to `true`. This setting specifies the host to be used for the Splunk software host. By default, it is set to `splunk.header.host`.

Optional parameters

Parameter Name	Description
`splunk.indexes`	Target Splunk indexes to send data to. This can be a list of indexes can be a list of indexes, and can also be the same sequence and order as topics. It is possible to inject data from different Kafka topics to different Splunk platform indexes. For example, prod-topic1, prod-topic2, and prod-topic3 can be sent to index prod-index1, prod-index2, and prod-index3. If you want to index all data from multiple topics to the main index, then "main" can be specified. If you leave this setting unconfigured, data will route to the default index configured against the HEC token. Verify that the indexes configured here are in the index list of HEC tokens, otherwise Splunk HEC will drop the data. By default, this setting is empty.
`splunk.sources`	Splunk event source metadata for the Kafka topic data. The same configuration rules as indexes can be applied. If left unconfigured, the default source binds to the HEC token. By default, this setting is empty.
`splunk.sourcetypes`	Splunk event source metadata for the Kafka topic data. The same configuration rules as indexes can be applied here. If left unconfigured, the default source binds to the HEC token. By default, this setting is empty.
`splunk.hec.backoff.threshhold.seconds`	The amount of time Splunk Connect for Kafka waits to attempt resending after errors from a HEC endpoint.
`splunk.flush.window`	The interval, in seconds, at which the events from Kafka connect will be flushed to your Splunk platform instance. By default, this is set to `30`.
`splunk.hec.ssl.validate.certs`	Valid settings are `true` or `false`, and they enable or disable HTTPS certification validation. By default, this is set to `true`.
`splunk.hec.http.keepalive`	Valid settings are `true` or `false`, and they enable or disable HTTPS connection keep-alive. By default, this is set to `true`.
`splunk.hec.max.http.connection.per.channel`	Controls how many HTTP connections will be created and cached in the HTTP pool for one HEC channel. By default, this is set to 2.
`splunk.hec.max.outstanding.events`	Maximum amount of un-acknowledged events kept in memory by connector. Will trigger back-pressure event to slow down collection if reached.
`splunk.hec.max.retries`	The amount of times a failed batch will attempt to resend before dropping events completely. Dropping events will result in data loss. Default is `-1`, which will retry indefinitely.
`splunk.hec.lb.poll.interval`	Specify this parameter(in seconds) to control the polling interval (increase to do less polling, decrease to do more frequent polling). Default is `120`.
`splunk.hec.enable.compression`	Used for enable or disable gzip-compression. Valid settings are true or false. Default is `false`.
`splunk.hec.total.channels`	Controls the total channels created to perform HEC event POSTs. By default, this is set to 2.
`splunk.hec.max.batch.size`	Maximum batch size when posting events to Splunk. The size is the actual number of Kafka events, and not byte size. By default, this is set to 500.
`splunk.hec.threads`	Controls how many threads are spawned to do data injection via HEC in a single connector task. By default, this is set to 1.
`splunk.hec.socket.timeout`	Internal TCP socket timeout when connecting to Splunk. By default, this is set to 60 seconds.
`splunk.hec.json.event.formatted`	Set to `true` for events that are already in HEC format. Valid settings are `true` or `false`.
`splunk.hec.ssl.trust.store.path`	Location of Java KeyStore. Default setting is `""`.
`splunk.hec.ssl.trust.store.password`	Password for Java Keystore. Default setting is `""`.

Acknowledgment parameters (optional)

Enable HTTP Event Collector (HEC) token acknowledgments to avoid data loss. Without HEC token acknowledgment, data loss may occur, especially in the case of a system restart or crash.

Parameter Name	Description
`splunk.hec.ack.enabled`	Valid settings are `true` or `false`. When set to true the Splunk Connect for Kafka connector will poll event acknowledgments (ACKs) for POST events before check-pointing the Kafka offsets. This is used to prevent data loss, as this setting implements guaranteed delivery. By default, this setting is set to `true`. If this setting is set to `true`, verify that the corresponding HEC token is also enabled with index acknowledgments, otherwise the data injection will fail, due to duplicate data. When set to `false`, the Splunk Connect for Kafka connector will only POST events to your Splunk platform instance. After it receives an HTTP 200 OK response, it assumes the events are indexed by Splunk. In cases where the Splunk platform crashes, there may be data loss.
`splunk.hec.ack.poll.interval`	This setting is only applicable when `splunk.hec.ack.enabled` is set to `true`. Internally it controls the event ACKs polling interval. By default, this setting is set to 10 seconds.
`splunk.hec.ack.poll.threads`	This setting is used for performance tuning and is only applicable when `splunk.hec.ack.enabled` is set to `true`. It controls how many threads should be spawned to poll event ACKs. By default, this is set to 1. For large Splunk indexer clusters (for example, 100 indexers) increase this number. Speed up ACK polling by increasing to 4 threads.
`splunk.hec.event.timeout`	This setting is applicable when `splunk.hec.ack.enabled` is set to `true`. This setting determines how long the connector will wait before timing out and resending when events are POSTed to Splunk and before they are ACKed. By default, this setting is set to 300 seconds.

Endpoint parameters (Optional)

Parameter Name	Description
`splunk.hec.raw`	Set to `true` for Splunk software to ingest data using the HEC /raw endpoint. Default is `false`, which will use the /event endpoint.
`splunk.hec.raw.line.breaker`	Only applicable to HEC /raw endpoint. The setting is used to specify a custom line breaker to help Splunk separate the events correctly. For example, you can specify `"#####"` as a special line breaker. Internally, Splunk Connect for Kafka will append this line breaker to every Kafka record to form a clear event boundary. The connector performs data injection in batch mode. On the Splunk platform, configure your HEC's `props.conf` file to set up line breaker for the source types. Then the Splunk software will break events for data flowing through the HEC /raw endpoint. By default, this setting is empty. For more on the HTTP Event Collector (HEC) see Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual.
`splunk.hec.json.event.enrichment`	Only applicable to the HEC /event endpoint. This setting is used to enrich raw data with extra indexed metadata fields. It contains a list of key value pairs separated by ",". The configured enrichment metadata will be indexed along with raw event data by Splunk software. Data enrichment for the HEC /event endpoint is only available in Splunk Enterprise 6.5 and later. By default, this setting is empty.
`splunk.hec.track.data`	Valid settings are `true` or `false`. When set to `true`, data loss and data injection latency metadata will be indexed along with raw data. This setting only works in conjunction with the HEC /event endpoint (`splunk.hec.raw` : `false`). By default, this setting is set to `false`.

Kerberos Parameters

Name	Description	Default Value
`kerberos.user.principle`	The Kerberos user principal connector can be used to authenticate with Kerberos.	`""`
`kerberos.keytab.path`	The path to the keytab file is used for authentication with Kerberos.	`""`

Protobuf Parameters

Name	Description	Default Value
`value.converter`	Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka. To use protobuf format, set the value of this field to `io.confluent.connect.protobuf.ProtobufConverter`.	`org.apache.kafka.connect.storage.StringConverter`
`value.converter.schema.registry.url`	Schema Registry URL.	`""`
`value.converter.schemas.enable`	To use protobuf format, set the value of this field to `true`.	`false`
`key.converter`	Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka. To use protobuf format, set the value of this field to `io.confluent.connect.protobuf.ProtobufConverter`.	`org.apache.kafka.connect.storage.StringConverter`.
`key.converter.schema.registry.url`	Schema Registry URL.	`""`
`key.converter.schemas.enable`	To use protobuf format, set the value of this field to `true`.	`false`

Timestamp Extraction Parameters

Name	Description	Default Value
`enable.timestamp.extraction`	To enable timestamp extraction, set the value of this field to `true`. NOTE: Applicable only if `splunk.hec.raw` is `false`.	`false`
`timestamp.regex`	Regex for timestamp extraction. NOTE: Regex must have name captured group `"time"`. For example, `\\\"time\\\":\\s\\\"(?<time>.?)\"` is formatted correctly.	`""`
`timestamp.format`	Time-format for timestamp extraction. For example, if timestamp is `1555209605000`, set `timestamp.format` to `"epoch"` format. If timestamp is `Jun 13 2010 23:11:52.454 UTC`, set `timestamp.format` to `"MMM dd yyyy HH:mm:ss.SSS zzz"`.	`""`

Related answers from Splunk Community

Data ingestion parameters for Splunk Connect for Kafka

Required parameters

Header parameters

Optional parameters

Kerberos Parameters

Protobuf Parameters

Timestamp Extraction Parameters

Comments

Data ingestion parameters for Splunk Connect for Kafka

Was this topic useful?