Configure Kafka data ingestion

Kafka ingestion works by issuing multiple micro-batch queries with consecutive time ranges connected to each other against live data from Splunk Enterprise. Running real-time indexed searches on Splunk Enterprise is not required. See How data gets in to Splunk UBA in Get Data into Splunk User Behavior Analytics for information about ingesting data sources not using Kafka.

Some data sources are known to have a lag when ingested into the Splunk platform, such as batch files that are ingested periodically. In such cases, you can adjust the Kafka ingestion properties to make sure that the data is still ingested by Splunk UBA. Perform the following steps to configure the Kafka ingestion properties:

Modify or add the properties to the /etc/caspida/local/conf/uba-site.properties file. See the table for the property names, descriptions, and default values.
Synchronize the cluster and restart Splunk UBA to make the configuration changes take effect.

These properties apply globally to all data sources sent to Kafka for ingestion, including any data sources that you may have configured earlier with different properties.

Property Description

splunk.kafka.ingestion.search.delay.seconds

The point in time where Splunk UBA begins Kafka ingestion. The default is 180 seconds (3 minutes) earlier than the start of the current minute. For example, if Kafka ingestion is enabled at 10 seconds past 1:02 PM, then the beginning of the minute is 1:02 PM. Specifying a delay of 120 seconds means that the first batch query begins processing events at 1:00 PM. The query runs on the events within the specified interval of time defined by splunk.kafka.ingestion.search.delay.seconds.

Do not configure this property to exceed 10800 seconds (3 hours).

You can configure the data ingestion start time for any individual data source by adding the data source name to the end of the property. For example, to configure delay of 120 seconds for a data source named exampledatasource, use the following property and value setting:

splunk.micro.batching.search.delay.seconds.exampledatasource = 120

Setting this property for an individual data source overrides the value of the splunk.kafka.ingestion.search.delay.seconds property.

splunk.kafka.ingestion.search.interval.seconds

The length of the time in seconds for each batch query. The default is 60 seconds, meaning that each query searches for 60 seconds worth of events, starting from the time defined by splunk.kafka.ingestion.search.interval.seconds.

Do not configure the interval to exceed 4 minutes.

You can configure the query interval for any individual data source by adding the data source name to the end of the property. For example, to configure an interval of 120 seconds for a data source named exampledatasource, use the following property and value setting:

splunk.micro.batching.search.interval.seconds.exampledatasource = 120

Setting this property for an individual data source overrides the value of the splunk.kafka.ingestion.search.interval.seconds property.

splunk.kafka.ingestion.search.max.lag.seconds The lag, or amount of time between the end time of the most recent batch query and the time Kafka ingestion starts. The default is 3600 seconds (1 hour). For example, if the first batch query ends at 1:00 PM and 59 seconds, and Kafka ingestion starts at 1:02 PM and 10 seconds, then the lag at that time is 1 minute and 11 seconds. If the lag exceeds the configured splunk.kafka.ingestion.search.max.lag.seconds, Splunk UBA shows an alert in the health monitor.

The respective time ranges of these properties is shown in the following diagram.

Related answers from Splunk Community

Configure Kafka data ingestion

Comments

Was this topic useful?