Splunk® User Behavior Analytics

Get Data In to Splunk User Behavior Analytics

Download manual as PDF

Download topic as PDF

Get data into Splunk UBA

Splunk UBA uses data from Splunk Enterprise to identify potential insider and external threats to your environment. Work with Splunk Professional Services to get started with importing important data sources and filtering events.

How data gets into Splunk UBA

Data can be ingested into Splunk UBA by performing queries against Splunk Enterprise to pull data in, or by pushing data from Splunk Enterprise to Splunk UBA using Kafka ingestion.

Time-based search

Splunk UBA performs micro-batched queries in 1-minute intervals against Splunk Enterprise to pull in events. This is the default method for getting data into Splunk UBA.

Using time-based search enables Splunk UBA to provide monitoring services for the status of your data ingestion. To monitor the status of your data ingestion:

To configure the properties of the queries:

  1. Edit the following properties in the /etc/caspida/local/conf/uba-site.properties file.
  2. Synchronize the cluster in distributed deployments:
    /opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf
Property Description
splunk.live.micro.batching.delay.seconds The point in time when Splunk UBA begins data ingestion. The default is 180 seconds (3 minutes) earlier than the start of the current minute. For example, if data ingestion is enabled at 10 seconds past 1:02 PM, then the beginning of the minute is 1:02 PM. Specifying a delay of 120 seconds means that the first batch query begins processing events at 1:00 PM. The query runs on the events within the specified interval of time defined by splunk.live.micro.batching.interval.seconds.
splunk.live.micro.batching.interval.seconds The length of the time in seconds for each batch query.
  • The default is 60 seconds, meaning that a query is run every 60 seconds for 60 seconds worth of events, starting from the time defined by splunk.live.micro.batching.delay.seconds.
  • If you specify 120 seconds as the interval, then a query is run every 120 seconds for 120 seconds worth of events.

Do not configure the interval to exceed 4 minutes.

connector.splunk.max.backtrace.time.in.hour The window of time that determines when to begin data ingestion after a data source is stopped and then restarted. The default backtrace time is 4 hours.
  • If a data source is stopped for a longer period of time than the configured connector.splunk.max.backtrace.time.in.hour interval, some events will be lost. For example, if a data source was stopped at 12:00AM and not restarted again until 6:00AM, and the connector.splunk.max.backtrace.time.in.hour is 4 hours, Splunk UBA will ingest events that occurred at 2:00AM. The events between 12:00AM and 2:00AM cannot be recovered.
  • If a data source is restarted inside the window of time configured by connector.splunk.max.backtrace.time.in.hour, Splunk UBA will continue to ingest events where it left off before the data source was stopped and attempt to catch up so there is no more lag. This is described in the text immediately below the table.

The search windows in Splunk UBA's micro-batch queries are expanded to ingest more events to compensate for lags during data ingestion. Searches are run every minute and for each search that takes less than 60 seconds, the search window is increased by 3 minutes to ingest a greater number of events. This enables Splunk UBA to gradually overcome a data ingestion lag, up to the point where data ingestion is back to the configured initial delay.

If any search takes more than 60 seconds to complete, the search window is reduced by 3 minutes, and the next search is issued immediately at the conclusion of the previous search. This is continued until the search can complete again in less than 60 seconds.

Consider the timeline In the following example, where a data source is stopped at 12:00AM and then restarted again at 1:00AM.

Search Start Time Search Duration Search Time Window Description of data ingestion
1:00:00AM 4 seconds 1 minute Ingest events occurring between 12:00AM - 12:01AM. Splunk UBA detects that there is a lag in the data ingestion. Since this search takes less than 60 seconds to complete, so the next search window is increased by 3 minutes.
1:01:00AM 6 seconds 4 minutes Ingest events occurring between 12:01AM - 12:05AM. This search takes less than 60 seconds to complete, so the next search window is increased by 3 minutes.
1:02:00AM 22 seconds 7 minutes Ingest events occurring between 12:05AM - 12:12AM. This search takes less than 60 seconds to complete, so the next search window is increased by 3 minutes.
1:03:00AM 67 seconds 10 minutes Ingest events occurring between 12:12AM - 12:22AM. This search takes longer than 60 seconds to complete:
  • The next search is issued immediately after this search is completed, at 1:04:07AM.
  • The next search window is decreased by 3 minutes.
1:04:07AM 61 seconds 7 minutes Ingest events occurring between 12:22AM - 12:29AM. This search takes longer than 60 seconds to complete:
  • The next search is issued immediately after this search is completed, at 1:05:08AM.
  • The next search window is decreased by 3 minutes.
1:05:08AM 26 seconds 4 Minutes Ingest events occurring between 12:29AM - 12:33AM. This search takes less than 60 seconds to complete:
  • The next search is issued at the normal interval, 1 minute from the time the current search is issued.
  • The search window is increased by 3 minutes.
1:06:08AM 31 seconds 7 Minutes Ingest events occurring between 12:33AM - 12:40AM.

This process continues until there is no more lag in the data ingestion, at which point the search window is returned to the default interval of 1 minute.

If a data source is stopped for a longer period of time than the configured connector.splunk.max.backtrace.time.in.hour interval, some events will be lost. For example, if a data source was stopped at 12:00AM and not restarted again until 6:00AM, and the connector.splunk.max.backtrace.time.in.hour is 4 hours, Splunk UBA will ingest events that occurred at 2:00AM. The events between 12:00AM and 2:00AM cannot be recovered.

Real-time search

Splunk UBA can perform real-time indexed queries against Splunk Enterprise to pull in events.

Use this method if you are experiencing problems with the default method of time-based data ingestion.

  1. Set the following property and value in the /etc/caspida/local/conf/uba-site.properties file:
    splunk.live.micro.batching=false
  2. Synchronize the cluster in distributed deployments:
    /opt/caspida/bin/Caspida sync-cluster /etc/caspida/local/conf

This method does not provide any monitoring services for your data ingestion. Only the default time-based search provides data ingestion health monitoring via the health monitor and Splunk UBA Monitoring app.

Direct to Kafka

Use this to push data from Splunk Enterprise to Splunk UBA when you have a single data source with EPS numbers in excess of 10,000.

See Send data from Splunk Enterprise directly to Kafka.

View supported data source types and prepare to add data sources to Splunk UBA

Before you add new data sources, review the types of data that you want to add and determine which ones Splunk UBA supports. See Which data sources do I need?.

  1. In Splunk UBA, select Manage > Data Sources.
  2. Click New Data Source.
  3. Review the data source types on the Data Source Type page. The supported data source types that can be added to Splunk UBA are listed on this page.

After you determine which data sources you can add, make sure that existing event filters do not affect the new data sources. Review the existing event filters to check for settings that negatively affect future data uploads. For example, an event filter that excludes source_IP data from one data source will affect the new data source. Modify the filters as needed as new data sources are added.

Get started with a small dataset

Get started with a smaller set of data before working in a full production environment. This is useful for verifying that the data coming into Splunk UBA is properly configured and mapped so that you see the desired anomalies and threats.

There are several ways to use a small dataset to get started in Splunk UBA:

Add data sources to Splunk UBA

Complete the following steps to properly get data into Splunk UBA.

  1. Verify you have the correct permissions. See Requirements for connecting to and getting data from the Splunk platform.
  2. Get HR data into Splunk UBA.
  3. Identify assets in your environment.
  4. Use blacklists and whitelists.
  5. Add data from Splunk Enterprise to Splunk UBA.

Verify your Splunk UBA data

See Verify that you successfully added the data source for information on how to verify a data source is successfully added to Splunk UBA.

There are times when some data sources, such as DHCP, DNS, AD, or HTTP do not provide a destination device. If you ingest one of these data types and see validation error messages, you can ignore these messages once you examine the raw event and validate the absence of the destination device in the raw event.

PREVIOUS
Which data sources do I need?
  NEXT
Add new file-based data sources to Splunk UBA

This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.0.1


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters