Splunk® Connect for Kafka

Install and Administer Splunk Connect for Kafka

This documentation does not apply to the most recent version of Splunk® Connect for Kafka. For documentation on the most recent version, go to the latest release.

Hardware and software requirements for Splunk Connect for Kafka

To install Splunk Connect for Kafka, you must meet the following requirements.

Plan your deployment

Use one of the following connector deployment options to deploy Splunk Connect for Kafka:

  • Splunk Connect for Kafka in a dedicated Kafka Connect Cluster (best practice).
  • Splunk Connect for Kafka in an existing Kafka Connect Cluster.

Splunk Connect for Kafka can run in containers, in virtual machines, or on physical machines. You can leverage any automation tools for deployment.

See the Plan a deployment section of the Splunk Enterprise manual for more information on planning your Splunk platform deployment.

System requirements

  • A Kafka Connect environment running Kafka version 1.0.0 or later.
  • Java 8 or later.
  • Splunk platform environment of version 7.1 or later.
  • Configured and valid HTTP Event Collector (HEC) tokens.

If you are using Splunk Cloud, use the Splunk Support Portal to request that Splunk Connect for Kafka be installed on your deployment. Splunk Support will set up and provide a URL for your HTTP Event Collector endpoint. If you are ingesting Kinesis Firehose events, you can reuse the HTTP Event Collector (HEC) endpoint setting you configured for the Splunk Add-on for Amazon Kinesis Firehose.

Supported technologies

Splunk Connect for Kafka lets you subscribe to a Kafka topic and stream the data to the Splunk HTTP event collector with the following deployment flavors:

  • Apache Kafka
  • Amazon Managed Streaming for Apache Kafka (Amazon MSK)
  • Confluent Platform

Architecture requirements

Splunk Connect for Kafka supports two types of architectures:

  • Directly inject data to a Splunk platform indexer cluster. For example:
A Kafka Connect Cluster (in containers or virtual machines or physical machines) -> Splunk Indexer Cluster (HEC)
  • Set up a heavy forwarder layer in front of a Splunk platform indexer cluster to offload the data injection load to your Splunk platform indexer cluster. Setting up a heavy forwarder layer can help distribute computational resources across your Splunk platform deployment. For example:
A Kafka Connect Cluster (in containers, virtual machines, or physical machines) -> Heavy Forwarders (HEC) -> Splunk Indexer Cluster

Optionally, the Splunk Connect for Kafka can use its internal load balancing to communicate to HEC ports on the indexers directly. See the parameter splunk.hec.uri in the Parameters topic of this manual to learn more.

See the Configuration examples topic of this manual to see examples of load balancing with a list of HEC enabled endpoints, and load balancing with a preconfigured load balancer.

Sizing guidelines

Determine how many Kafka Connect instances to deploy by calculating how much volume per day Splunk Connect for Kafka needs to index in your Splunk platform deployment. For example, an 8 CPU, 16 GB memory machine can potentially achieve 50 - 60 MBs per second throughput from Kafka Connect into your Splunk platform deployment if your Splunk platform deployment is sized correctly.

Do not create more tasks than the number of partitions in your deployment. Creating 2 * CPU tasks per Kafka Connector is a safe estimate.

For example, if you have the following deployment:

  • 5 Kafka Connects running the Splunk Connect for Kafka.
  • Each host has 8 CPUs with 16 GB memory.
  • There are 200 partitions to collect data from. max.tasks will be: max.tasks = 2 * CPUs/host * Kafka Connect instances = 2 * 8 * 5 = 80 tasks.
  • Alternatively, if there are only 60 partitions to consume from, set max.tasks to 60.

Benchmark results

A single instance of Splunk Connect for Kafka can reach maximum indexed throughput of 32 MB/second with the following testbed and raw HEC endpoint in use:

Hardware specifications:

  • AWS: EC2 c4.2xlarge, 8 vCPU and 31 GB Memory.
  • Splunk Cluster: 3 indexer cluster without load balancer.
  • Kafka Connect: JVM heap size configuration is "-Xmx6G -Xms2G".
  • Kafka Connect resource usage: ~6GB memory, ~3 vCPUs.
  • Kafka records size: 512 Bytes.
  • Batch size: Maximum 100 Kafka records per batch which is around 50KB per batch.

HTTP Event Collector (HEC) requirements

  • HEC token settings must be the same on all Splunk platform data injection nodes in your environment, including indexers and heavy forwarders.
  • (Optional) When creating a HEC token, enable indexer acknowledgment in order to prevent potential data loss.
  • Enable HEC token acknowledgements in order to avoid data loss. This is a best practice.

If indexer acknowledgment is enabled, set ackIdleCleanup to true in inputs.conf

See Set up and use HTTP Event Collector in Splunk Web in the Splunk Enterprise manual and About HTTP Event Collector Indexer Acknowledgment for more information.

Last modified on 12 July, 2022
Splunk Connect for Kafka   Data ingestion parameters for Splunk Connect for Kafka

This documentation applies to the following versions of Splunk® Connect for Kafka: 2.0.9


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters