Splunk® Supported Add-ons

Splunk Add-on for Kafka

Download manual as PDF

Download topic as PDF

Configure modular inputs for the Splunk Add-on for Kafka

The Splunk Add-on for Kafka is deprecated and the functionality of pulling Kafka events using modular inputs is no longer supported. Use Splunk Connect for Kafka to pull Kafka events using modular inputs.

The Splunk Add-on for Kafka includes a modular input that consumes messages from the Kafka topics that you specify. The Splunk instances that collect the Kafka data must be in the same network as your Kafka machines. Depending on your message volume on your Kafka clusters, this source of data can be very large. To determine how many heavy forwarders to dedicate to this task, review the sizing guidelines in Hardware and software requirements for the Splunk Add-on for Kafka.

Decide how you want to configure data collection for this input. You have four options. Follow the directions in the section that matches your preferences.

  • Manage inputs centrally from a single node. Use Splunk Web to configure the collection parameters and identify the heavy forwarders to perform the collection from one central location, usually a search head. The Splunk platform then handles the division of collection tasks between the forwarders automatically.
The architecture diagram shows a Splunk Add-on for Kafka on one search head acting as the input manager. An admin configures that node, and the configurations are pushed to the heavy forwarders, which collect the data from the Kafka clusters.
The architecture diagram shows a configured Splunk Add-on for Kafka on each individual heavy forwarder. An admin configures input collection on each forwarder manually.
The architecture diagram shows a Splunk Add-on for Kafka on a single instance of the Splunk platform, configured to collect data from Kafka clusters.
  • Manage inputs using the configuration files. Use the configuration files to configure your modular inputs. This topic covers how to use the configuration files to manage inputs centrally from one node, but other configurations are also possible.


If you have the Snappy compression method enabled when injecting data into Kafka, you must install a Snappy binding to allow this add-on to support Snappy Kafka messages. See Kafka data injected with Snappy compression enabled for details.

Manage inputs centrally from a single node

If you want to manage your modular input configuration centrally on one node, you do not need to configure the add-on on each individual forwarder. Deploy the unconfigured add-on to all heavy forwarders that you want to use to collect Kafka topic data, then proceed with the steps below.

  1. Select a Splunk platform instance to use to centrally manage the input configuration. This node pushes the configuration to the forwarders that you specify to perform the data collection, so there is no single point of failure.
    Splunk platform instance Description
    Unclustered search head Choose one search head to manage the configurations.
    Clustered search head Choose one search head in the cluster to manage the configurations. Before you begin, click Settings > Show All Settings so that you can see the Setup link on your search head cluster node.
    Heavy forwarder Choose one heavy forwarder to manage the configurations. You can also include this forwarder in the set of forwarders that perform the data collection.
  2. In Splunk Web, click Apps > Manage Apps.
  3. In the row for Splunk Add-on for Kafka, click Set up. Wait a few seconds to allow the page to fully load.
  4. (Optional) Configure the Logging level for the add-on.
  5. Under Credential Settings, click Add Kafka Cluster and fill out the fields.
    Field Description
    Kafka Cluster A name to identify this Kafka cluster.
    Kafka Brokers The IP addresses and ports for the Kafka instances which handle requests from consumers, producers, and indexing logs for this Kafka cluster. If you have more than one, use a comma-separated list.
    Topic Whitelist (Optional) A whitelist statement that specifies which topics from which you want to collect data. Supports regex. For example, my_topic.+. Whitelist overrides blacklist.
    Topic Blacklist (Optional) A blacklist statement that explicitly excludes topics from which you do not want to collect data. Supports regex. For example, _internal_topic.+.
    Partition IDs (Optional) Specific partition IDs from which to collect data, in a comma-separated list. Leave blank to collect from all partitions. Specify partition IDs when you have a very large topic that you want to divide into separate data collection tasks to improve performance.
    Partition Offset Select either Earliest or Latest for your offset. Select Earliest to ingest all historical data or Latest to start from now. Default is Earliest.
    Topic Group (Optional) Enter a name to define this configuration as part of a topic group. Use the same topic group name in other configurations to collect the data collection tasks for all configurations associated with this group into the same underlying process. Use topic groups to collect multiple small topic collection tasks together to improve performance.
    Index (Optional) Configure an index for this configuration. The default is main.
  6. Click Save. Wait a few seconds after saving, then repeat for any additional Kafka clusters from which you want to collect topic data.
  7. Click Add Forwarder and fill out the fields.
    Field Description
    Heavy Forwarder Name A friendly name to identify this forwarder.
    Heavy Forwarder Hostname and Port The IP or DNS name and port of this forwarder.
    Heavy Forwarder Username The username to use to access this forwarder.
    Heavy Forwarder Password The password to use to access this forwarder.
  8. Click Save.
  9. Wait a few seconds after saving, then repeat steps 7 and 8 for all heavy forwarders that you want to use to collect data from your Kafka clusters. If you are performing this configuration from a heavy forwarder, add the credentials for the heavy forwarder you are currently on as well.
  10. At the bottom of the screen, click Save. The Splunk platform pushes the input configuration to the heavy forwarders you have identified and automatically divides up the collection tasks between them.
  11. Validate that data is coming in by running the following search:

    sourcetype=kafka:topicEvent

Manage inputs manually from each forwarder

If you want to manage input job allocation manually, perform these steps on each heavy forwarder that you want to use to collect data from Kafka topics.

  1. In Splunk Web, click Apps > Manage Apps.
  2. In the row for Splunk Add-on for Kafka, click Set up.
  3. (Optional) Configure the Logging level for the add-on.
  4. Under Credential Settings, click Add Kafka Cluster and fill out only the first two fields. Leave all other fields blank.
    Field Description
    Kafka Cluster A name to identify the Kafka cluster.
    Kafka Brokers The IP addresses and ports for the Kafka instances which handle requests from consumers, producers, and indexing logs for this Kafka cluster. If you have more than one, use the format (<host:port>[,<host:port>][,...]).
  5. Click Save. Repeat for any additional Kafka clusters from which you want to collect topic data using this forwarder.
  6. Click Settings > Data inputs.
  7. In the row for Splunk Add-on for Kafka, click Add new.
  8. Fill out the form.
    Field Description
    Kafka data input name A name to identify this input.
    Kafka cluster The name of a Kafka cluster that you configured in step 4 or 5.
    Kafka topic The Kafka topic on this cluster from which you want to collect data.
    Kafka partitions (Optional) Specific partition IDs from which to collect data, in a comma-separated list. Leave blank to collect from all partitions. Specify partition IDs when you have a very large topic that you want to divide into separate data collection tasks to improve performance.
    Kafka partition offset Select Earliest offset or Latest offset. Select Earliest to ingest all historical data or Latest to start from now.
    Topic group (Optional) Enter a name to define this input as part of a topic group. Use the same topic group name in other inputs to collect the data collection tasks for all inputs associated with this group into the same underlying process. Use topic groups to collect multiple small topic collection tasks together to improve performance.
    Index (Optional) Configure an index for this input. The default is main.
  9. Click Next.
  10. Repeat steps 6 through 9 for any additional topics from which you want to collect data using this forwarder.
  11. Repeat steps 1 through 10 on all other heavy forwarders that you want to use to collect topic data. Be sure not to collect the same topic twice using two different forwarders to avoid duplicate data collection.
  12. Validate that data is coming in by running the following search:

    sourcetype=kafka:topicEvent

Manage inputs on a single instance or Splunk Cloud

These instructions apply if you are setting up a POC on a single instance installation of Splunk Enterprise, or if you have a self-service Splunk Cloud instance. If you want to use this add-on on a managed Splunk Cloud deployment, contact Support for assistance. Not sure if you have a managed or a self-service Splunk Cloud deployment? See Types of Splunk Cloud deployment to learn how they differ.

  1. In Splunk Web, click Apps > Manage Apps.
  2. In the row for Splunk Add-on for Kafka, click Set up.
  3. (Optional) Configure a Logging level for the add-on.
  4. Under Credential Settings, click Add Kafka Cluster and fill out the fields.
    Field Description
    Kafka Cluster A name to identify this Kafka cluster.
    Kafka Brokers The IP addresses and ports for the Kafka instances which handle requests from consumers, producers, and indexing logs for this Kafka cluster. If you have more than one, use a comma-separated list.
    Topic Whitelist (Optional) A whitelist statement that specifies which topics from which you want to collect data. Supports regex. For example, my_topic.+. Whitelist overrides blacklist.
    Topic Blacklist (Optional) A blacklist statement that explicitly excludes topics from which you do not want to collect data. Supports regex. For example, _internal_topic.+.
    Partition IDs (Optional) Specific partition IDs from which to collect data, in a comma-separated list. Leave blank to collect from all partitions. Specify partition IDs when you have a very large topic that you want to divide into separate data collection tasks to improve performance.
    Partition Offset Select either Earliest or Latest for your offset. Select Earliest to ingest all historical data or Latest to start from now. Default is Earliest.
    Topic Group (Optional) Enter a name to define this configuration as part of a topic group. Use the same topic group name in other configurations to collect the data collection tasks for all configurations associated with this group into the same underlying process. Use topic groups to collect multiple small topic collection tasks together to improve performance.
    Index (Optional) Configure an index for this configuration.
  5. Click Save. Repeat for any additional Kafka clusters from which you want to collect topic data.
  6. Click Add Forwarder and fill out the fields to describe the instance that you are currently on.
    Field Description
    Heavy Forwarder Name A friendly name to identify this instance.
    Heavy Forwarder Hostname and Port The IP or DNS name and port of this instance.
    Heavy Forwarder Username The username to use to access this instance.
    Heavy Forwarder Password The password to use to access this instance.
  7. Click Save.
  8. At the bottom of the screen, click Save.
  9. Validate that data is coming in by running the following search:

    sourcetype=kafka:topicEvent

Manage inputs using the configuration files

These directions describe how to use configuration files to manage inputs centrally from a single node of Splunk Enterprise, not manually from each individual forwarder. Using this procedure does not involve the inputs.conf file, only the custom configuration files for this add-on.

  1. Select a Splunk platform instance to use to centrally manage the input configuration. You can use a search head or a heavy forwarder. You can also perform this configuration on a single instance Splunk Enterprise.
  2. On the instance, create a file called kafka_credentials.conf in $SPLUNK_HOME/etc/apps/Splunk_TA_Kafka/local.
  3. In the file, create stanzas for each Kafka cluster from which you want to collect data, using the template below.
    [<KafkaClusterFriendlyName>]
    kafka_brokers = <comma-separated list of Kafka IP addresses and ports for the Kafka instances 
    which handle requests from consumers, producers, and indexing logs for this Kafka cluster.>
    kafka_partition= <optional comma-separated list of partition IDs>
    kafka_partition_offset = <earliest or latest>
    kafka_topic_blacklist = <optional regex of topics to blacklist from data collection.>
    kafka_topic_whitelist = <optional regex of topics to whitelist for data collection.>
    kafka_topic_group = <optional group name, used to combine multiple small tasks into the same process>
    index = <optional index for this data, overriding the default index for all Kafka data that you can set in kafka.conf>
    disabled = 0
    

    For example:

    [MyFavoriteKafkaCluster]
    kafka_brokers = 72.16.107.153:9092,172.16.107.154:9092,172.16.107.155:9092
    kafka_partition= 0,1
    kafka_partition_offset = earliest
    kafka_topic_blacklist = test.+
    kafka_topic_whitelist = _internal.+
    index = kafka
    disabled = 0
    
    [MyOtherFavoriteKafkaCluster]
    kafka_brokers = 10.66.128.237:9092,10.66.128.191:9092
    kafka_partition_offset = latest
    kafka_topic_group = newsmalltopics
    index = kafka-misc
    disabled = 0
    
  4. Save the file.
  5. Create a file called kafka_forwarder_credentials.conf in $SPLUNK_HOME/etc/apps/Splunk_TA_Kafka/local.
  6. In the file, create a stanza for each heavy forwarder you want to use to collect topic data from your Kafka clusters, using the template below. If you are performing this configuration on a heavy forwarder, include a stanza for the instance you are currently on. If you are doing a POC on a single instance and not using forwarders, enter the credentials for the single instance.
    [<HeavyForwarderFriendlyName>]
    hostname = <Heavy forwarder's ip and port.>
    username = <Username to access the heavy forwarder.>
    password = <Password to access the heavy forwarder.>
    disabled = False
    

    For example:

    [LocalHF]
    hostname = localhost:8089
    username = ********
    password = **************
    disabled = False
    
    [HF1]
    hostname = 10.20.144.233:8089
    username = ******
    password = ************
    disabled = False
    
    [HF2]
    hostname = 10.20.144.210:8089
    username = *********
    password = ********
    disabled = False
    
  7. Save the file.
  8. If you need to adjust any global settings for this add-on, create a file called kafka.conf in $SPLUNK_HOME/etc/apps/Splunk_TA_Kafka/local. See $SPLUNK_HOME/etc/apps/Splunk_TA_Kafka/README/kafka.conf.spec for details, or consult Troubleshoot the Splunk Add-on for Kafka for information about special cases.
  9. Restart the instance.
  10. Validate that data is coming in by running the following search:

    sourcetype=kafka:topicEvent

PREVIOUS
Configure JMX inputs for the Splunk Add-on for Kafka
  NEXT
Troubleshoot the Splunk Add-on for Kafka

This documentation applies to the following versions of Splunk® Supported Add-ons: released


Comments

Hi, Anmar,

Thanks for your question, Anmar0293. Yes, you need forwarders to send data from your Kafka machines to your Splunk platform instance.

Jbalik splunk, Splunker
December 3, 2018

Is it necessary to add a forwarder? Can I avoid it?

Warm regards,

Anmar0293
December 3, 2018

Does this support ssl protected kafka topic? During the configuration I could not see any option to provide the cert/ssl info.

RameshKukunarapu
May 24, 2017

Hi Rdleetivo, you can override source types and do your own timestamp extraction using props.conf. See http://docs.splunk.com/Documentation/Splunk/latest/Data/Advancedsourcetypeoverrides for details.

Rpille splunk, Splunker
March 22, 2016

So, downloaded the new 1.1.0 version. Still unclear how to do timestamp field extraction. Can't believe you guys *still* haven't figured out how to deal with structured json data. :-(

Rdleetivo
March 21, 2016

Hi Ashoksamal6363636363636363. We do not support that right now, but I'll file an enhancement request with the product team.

Rpille splunk, Splunker
March 7, 2016

how to connect to a kerborised kafka?

Ashoksamal6363636363636363
January 28, 2016

Where is the sourcetype setting done for timestamp field extraction?

Rdleetivo
October 26, 2015

It's unclear what the difference is between the global settings on the kafka cluster for whitelist/blacklist/starting-offset and the per data input settings. Are the per data input settings filtered against the global settings? Are the global settings fallback defaults if not specified by the data input?

Also, it seems extremely weird that the *index* for the kafka data can only be set once globally for *all* kafka clusters, rather than individually for each data input.

Rdleetivo
October 26, 2015

Note that I needed to manually log in to the splunk machine, delete the local/kafka_credentials.conf file, and restart splunk to get rid of the configuration.

Rdleetivo
October 26, 2015

I'm unable to delete a kafka cluster after adding it. I get the following error in the web UI

Encountered the following error while trying to update: In handler 'localapps': Error while posting to url=/servicesNS/nobody/Splunk_TA_kafka/kafka_input_setup/kafka_settings/kafka_settings

Rdleetivo
October 26, 2015

Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters