Plan your deployment
Use one of the following connector deployment options to deploy Splunk Connect for Kafka:
- Splunk Connect for Kafka in a dedicated Kafka Connect Cluster (best practice).
- Splunk Connect for Kafka in an existing Kafka Connect Cluster.
Splunk Connect for Kafka can run in containers, in virtual machines, or on physical machines. You can leverage any automation tools for deployment.
See the Plan a deployment section of the Splunk Enterprise manual for more information on planning your Splunk platform deployment.
Determine how many Kafka Connect instances to deploy by calculating how much volume per day Splunk Connect for Kafka needs to index in your Splunk platform deployment. For example, an 8 CPU, 16 GB memory machine can potentially achieve 50 - 60 MBs per second throughput from Kafka Connect into your Splunk platform deployment if your Splunk platform deployment is sized correctly.
Do not create more tasks than the number of partitions in your deployment. Creating 2 * CPU tasks per Kafka Connector is a safe estimate.
For example, if you have the following deployment:
- 5 Kafka Connects running the Splunk Connect for Kafka.
- Each host has 8 CPUs with 16 GB memory.
- There are 200 partitions to collect data from.
max.taskswill be: max.tasks = 2 * CPUs/host * Kafka Connect instances = 2 * 8 * 5 = 80 tasks.
- Alternatively, if there are only 60 partitions to consume from, set
A single instance of Splunk Connect for Kafka can reach maximum indexed throughput of 32 MB/second with the following testbed and raw HEC endpoint in use:
- AWS: EC2 c4.2xlarge, 8 vCPU and 31 GB Memory.
- Splunk Cluster: 3 indexer cluster without load balancer.
- Kafka Connect: JVM heap size configuration is "-Xmx6G -Xms2G".
- Kafka Connect resource usage: ~6GB memory, ~3 vCPUs.
- Kafka records size: 512 Bytes.
- Batch size: Maximum 100 Kafka records per batch which is around 50KB per batch.
This documentation applies to the following versions of Splunk® Connect for Kafka: 1.1.0