Splunk® User Behavior Analytics

Develop Custom Content in Splunk User Behavior Analytics

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Understanding Splunk UBA data cubes

A data cube in Splunk UBA is a database table containing aggregated events based on specific attributes. Batch models in Splunk UBA use the data aggregated in data cubes to generate content such as anomalies and threats.

This screen image shows how data is ingested from Splunk Enterprise to Splunk UBA, enriched with view data, and consumed by data cubes. A description of this process is provided in the surrounding text.

In this example, Splunk UBA ingests events from the Splunk platform using the Splunk Direct and Splunk Raw Events connectors. See Use connectors to add data from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics for more information about the connectors and when to use which connector. Events are recognized by Splunk UBA and tagged as belonging to a specific view. For example, an event from data source DS1 is tagged with _AD as Splunk UBA recognizes this as Active Directory (AD) data.

Data cubes subscribe to one or more views and can track any number of attributes from those views. For example, if a view has 20 attributes, a data cube may track only 5 of the attributes of that view. In our example, we see that Data Cube 1 is tracking three attributes from the AD view and one attribute from the HTTP view.

Either the userId or deviceId attribute must be tracked in order for Splunk UBA to generate anomalies. Whichever attribute you choose is called the default entity ID. Without either one, anomalies are not raised even if the other attributes indicate otherwise.

Splunk UBA batch models use the data stored in a cube to generate content. Because each cube stores different types of data, a model must choose the appropriate cube in order to have the correct data for its content.

Example cube and descriptions

Consider the example data cube below. The cube tracks three attributes, which serve as the table columns: userId, proccessPath, and processName. These attributes are called Dimensions. A custom cube can have a maximum of six dimensions.

The cube can track unique combinations of these dimensions per hour, day, month, or year. A function can count the total number of occurrences for each combination per hour, day, month, or year. This function is called a Measure. A custom cube can have a maximum of three measures.

As events are ingested into Splunk UBA, the desired attributes from each event populate the rows of the cube based on the refresh interval. For example, the cube may be updated every day, as shown in the following example:

day | userId | processPath | processName | COUNT
----+--------+-------------+-------------+------
1   | user1  | /path1      | exe1        | 753
1   | user2  | /path2      | exe2        | 753
2   | user1  | /path1      | exe1        | 303
2   | user2  | /path2      | exe2        | 300
2   | user1  | /path2      | exe1        | 3
3   | user1  | /path1      | exe1        | 441
3   | user2  | /path2      | exe2        | 450

In this example, we can see the following behavior over three days of data:

  • It is normal for user1 to run exe1 on path1
  • It is normal for user2 to run exe2 on path2

On day 2, the event where user1 runs exe1 on path2 is not normal behavior and considered rare, so this event may cause an anomaly to be raised, depending on the specific threshold configured in the model that is consuming data from this cube. This particular pattern occurred 3 times out of 3000. If the model sets the rarity threshold to be 1 out of 1000, then an anomaly is raised for user1.

A larger number of attributes can enhance the content generation in Splunk UBA as Splunk UBA can analyze events for more specific deviations or malicious behavior. Let's add an additional attribute to the cube so that deviceId is tracked along with the other attributes.

day | userId | deviceId | processPath | processName | COUNT
----+--------+----------+-------------+-------------+------
1   | user1  | device1  | /path1      | exe1        | 753
1   | user2  | device2  | /path2      | exe2        | 753
2   | user1  | device1  | /path1      | exe1        | 303
2   | user2  | device2  | /path2      | exe2        | 300
2   | user1  | device1  | /path2      | exe1        | 1
2   | user1  | device2  | /path2      | exe1        | 1
2   | user1  | device3  | /path2      | exe1        | 1
3   | user1  | device1  | /path1      | exe1        | 441
3   | user2  | device2  | /path2      | exe2        | 450

Using the same data with the additional dimension of the deviceId attribute, we see the following behavior patterns:

  • It is normal for user1 to run exe1 on path1 on device1
  • It is normal for user2 to run exe2 on path2 on device2

On day 2, there are now three separate entries for user1 running exe1 on path2 on three different devices. In this case, three separate anomalies are raised.

By default, Splunk UBA data cubes retain 30 days worth of data. You can configure this retention period to be larger if desired, provided you allocate enough disk space for the data storage.

Examine existing cubes to get more information about Splunk UBA data views

Each attribute in a view has a unique attribute key associated with it. Splunk UBA uses attribute keys to extract the corresponding value of an attribute from the data event's view or from the data event directly.

You must provide specific attribute keys when you create a custom cube, so examining the content of existing cubes is a necessary prerequisite to building custom cubes. Use the following syntax formats to specify attribute keys:

Syntax Description and example
view.<viewname>.<object/method> Use this format to extract the value of the <method/object> from only the specified view. For example:
  • view.network.source extracts the value of source from the network view.
  • view.network.source.deviceType extracts the value of deviceType from the source from the network view.
view.*.<object/method> Use this format to extract the value of the <method/object> from all views. For example:
  • view.*.source extracts the value of source from all views.
  • view.*.source.deviceType extracts the value of deviceType from the source from all views.
event.<objectname> Use this format to extract the value of the <objectname> directly from a generic source type event. For example event.datasourceId extracts the value of the data source ID directly from the event.
event.attribute#<attributename> Use this format to extract the value of the <attributename> directly from a generic source type event. For example, event.attribute#vendor tracks the value of the vendor for a specific event.

Use the view formats to aggregate the values from the following types of events:

Use the event formats to aggregate the values from events whose data sources are configured as generic data types (uba_source_type="generic"). See Add custom data to Splunk UBA using the generic data source in Get Data into Splunk User Behavior Analytics.

For example, examine a raw event from a Windows event log in multiline format:

09/12/2019 04:42:00 PM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4672
EventType=0
Type=Information
ComputerName=acmel-lpt0399177
TaskCategory=Special Logon
OpCode=Info
RecordNumber=4660700202
Keywords=Audit Success
Message=Special privileges assigned to new logon.
Subject:
	Security ID:		acme\carbanak
	Account Name:		carbanak
	Account Domain:		acme
	Logon ID:		0x95455c63
Privileges:		SeTcbPrivilege
			SeDebugPrivilege
			SeLoadDriverPrivilege
			SeSecurityPrivilege
			SeBackupPrivilege

To extract the value of the EventCode, store it in the cube, and included the value in Splunk UBA content, begin by examining the windows events cube in Splunk UBA:

  1. Make sure you are logged in to Splunk UBA as a user with content developer privileges.
  2. Select System > Cubes.
  3. In the URL, add ?system immediately following the host name or IP address. For example:
    https://uba-001.example.com/?system#Y2FzcGlkYS5jdWJlcy5jdWJlRGV0YWlsc1ZpZXc=
  4. Select the windowsevents cube to view its details.

At the top of the page, there is an attribute with the name eventId, an its attribute key is view.ad.eventId. This is the attribute key that must be provided when you are creating a new cube and want to store the value of this variable in the cube. Note that the names do not always coincide, as the field name is EventCode in the raw event but transformed to eventId in Splunk UBA.

This screen image shows the top of the details page for the windows events cube. The highlighted section is described in the text immediately preceding this image.

Store generic events in Splunk UBA data cubes

Generic events are events from data sources for which Splunk UBA does not have any existing parsing logic. For example, you can ingest credit card transaction logs which are not CIM compliant and for which Splunk UBA does not parse by default. Events from this type of data source can be ingested into Splunk UBA as generic events and aggregated in data cubes. Splunk UBA can generate anomalies based on generic events stored in the data cubes as long as one of userId or deviceId are also present in the cube.

Suppose you have a data source called exampleDS and the field you want to store in the cube is called exampleDSfield. Use the following syntax to store the value of this field in the cube:

event.attribute#exampleDSfield

Use the view. methods described in Examine existing cubes to get more information about Splunk UBA data views only when Splunk UBA can map the attribute to one of the following defined in the table. These attributes contain identity resolution logic used for generating entities in Splunk UBA.

Entity you want to track in Splunk UBA Your event contains this field Mapped entity in Splunk UBA What can be extracted by Splunk UBA
User srcUser srcUser
  • User name: view.*.scrUser
  • User ID: view.*.srcUser.uuid
destUser destUser
  • User name: view.*.destUser
  • User ID: view.*.destUser.uuid
Source Device sourceIp source
  • Device name: view.*.source
  • Device ID: view.*.source.uuid
  • Device type: view.*.source.deviceType
  • Device scope: view.*.source.scope
  • Port: view.*.source.port
httpClientIp source
clientIp source
endpointIp source
Destination Device destinationIp destination
  • Device name: view.*.destination
  • Device ID: view.*.destination.uuid
  • Device type: view.*.destination.deviceType
  • Device scope: view.*.destination.scope
  • Port: view.*.destination.port
peeringHost destination
Server Device serverIp server
  • Server name: view.*.server
Origin Device origin origin
  • Origin device ID: view.*.origin.uuid
  • Origin device FQDN: view.*.origin.fqdn
Application primaryApplication application
  • Application name: view.*.application
  • Application ID: view.*.application.uuid

For example, the parsed token view.*.source populates the source field in the cube with the device name as one of the dimensions. You can also extract additional device information such as the device ID, type, scope, and port.

Since the exampleDSfield attribute is not one of the ones listed in the table, it can't be matched by Splunk UBA. You must use the event.attribute format to store the value of exampleDSfield in the cube.

Last modified on 15 July, 2020
PREVIOUS
What is the custom use case framework?
  NEXT
Create a new data cube

This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.4.1, 5.0.5, 5.0.5.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters