Understanding Splunk UBA data cubes

A data cube in Splunk UBA is a database table containing aggregated events based on specific attributes. Batch models in Splunk UBA use the data aggregated in data cubes to generate content such as anomalies and threats.

In this example, Splunk UBA ingests events from the Splunk platform using the Splunk Direct and Splunk Raw Events connectors. Events are recognized by Splunk UBA and tagged as belonging to a specific view. For example, an event from data source DS1 is tagged with _AD as Splunk UBA recognizes this as Active Directory (AD) data.

For more information about the connectors and when to use which connector, see Use connectors to add data from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics.

Data cubes subscribe to one or more views and can track any number of attributes from those views. For example, if a view has 20 attributes, a data cube might track only 5 of the attributes of that view. In this example, Data Cube 1 is tracking three attributes from the AD view, and one attribute from the HTTP view.

Either the userId or deviceId attribute must be tracked in order for Splunk UBA to generate anomalies. Whichever attribute you choose is called the default entity ID. Without either one, anomalies are not raised even if the other attributes indicate otherwise.

Splunk UBA batch models use the data stored in a cube to generate content. Because each cube stores different types of data, a model must choose the appropriate cube in order to have the correct data for its content.

Example cube and descriptions

Consider the following example data cube. This cube tracks three attributes, which serve as the table columns: userId, proccessPath, and processName. These attributes are called Dimensions.

A custom cube can have a maximum of six dimensions.

The cube can track unique combinations of these dimensions per hour, day, month, or year. A function can count the total number of occurrences for each combination per hour, day, month, or year. This function is called a Measure.

A custom cube can have a maximum of three measures.

As events are ingested into Splunk UBA, the desired attributes from each event populate the rows of the cube based on the refresh interval. For example, the cube may be updated every day, as shown in the following example:

day | userId | processPath | processName | COUNT
----+--------+-------------+-------------+------
1   | user1  | /path1      | exe1        | 753
1   | user2  | /path2      | exe2        | 753
2   | user1  | /path1      | exe1        | 303
2   | user2  | /path2      | exe2        | 300
2   | user1  | /path2      | exe1        | 3
3   | user1  | /path1      | exe1        | 441
3   | user2  | /path2      | exe2        | 450

In this example, you can see the following behavior over three days of data:

It is normal for user1 to run exe1 on path1
It is normal for user2 to run exe2 on path2

On day 2, the event where user1 runs exe1 on path2 is not normal behavior and considered rare, so this event might cause an anomaly to be raised, depending on the specific threshold configured in the model that is consuming data from this cube. This particular pattern occurred 3 times out of 3000. If the model sets the rarity threshold to be 1 out of 1000, then an anomaly is raised for user1.

A larger number of attributes can enhance the content generation in Splunk UBA as Splunk UBA can analyze events for more specific deviations or malicious behavior. Add an additional attribute to the cube so that deviceId is tracked along with the other attributes.

day | userId | deviceId | processPath | processName | COUNT
----+--------+----------+-------------+-------------+------
1   | user1  | device1  | /path1      | exe1        | 753
1   | user2  | device2  | /path2      | exe2        | 753
2   | user1  | device1  | /path1      | exe1        | 303
2   | user2  | device2  | /path2      | exe2        | 300
2   | user1  | device1  | /path2      | exe1        | 1
2   | user1  | device2  | /path2      | exe1        | 1
2   | user1  | device3  | /path2      | exe1        | 1
3   | user1  | device1  | /path1      | exe1        | 441
3   | user2  | device2  | /path2      | exe2        | 450

Using the same data with the additional dimension of the deviceId attribute, you can see the following behavior patterns:

It is normal for user1 to run exe1 on path1 on device1
It is normal for user2 to run exe2 on path2 on device2

On day 2, there are now three separate entries for user1 running exe1 on path2 on three different devices. In this case, three separate anomalies are raised.

By default, Splunk UBA data cubes retain 30 days worth of data. You can configure this retention period to be larger if desired, provided you allocate enough disk space for the data storage.

Examine existing cubes to get more information about Splunk UBA data views

Each attribute in a view has a unique attribute key associated with it. Splunk UBA uses attribute keys to extract the corresponding value of an attribute from the data event's view or from the data event directly.

You must provide specific attribute keys when you create a custom cube, so examining the content of existing cubes is a necessary prerequisite to building custom cubes. Use the following syntax formats to specify attribute keys:

Syntax	Description and example
view.<viewname>.<object/method>	Use this format to extract the value of the `<method/object>` from only the specified view. For example: `view.network.source` extracts the value of `source` from the `network` view. `view.network.source.deviceType` extracts the value of `deviceType` from the `source` from the `network` view.
view..<object/method>*	Use this format to extract the value of the `<method/object>` from all views. For example: `view..source` extracts the value of `source` from all views. `view..source.deviceType` extracts the value of `deviceType` from the `source` from all views.
event.<objectname>	Use this format to extract the value of the `<objectname>` directly from a generic source type event. For example `event.datasourceId` extracts the value of the data source ID directly from the event.
event.attribute#<attributename>	Use this format to extract the value of the `<attributename>` directly from a generic source type event. For example, `event.attribute#vendor` tracks the value of the vendor for a specific event.

Use the view formats to aggregate the values from the following types of events:

Events from CIM-compliant data sources ingested using Splunk Direct. See Add CIM-compliant data from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics.
Events from data sources ingested using Splunk Raw Events and Splunk UBA's native parsers. See Add raw events from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics.

Use the event formats to aggregate the values from events whose data sources are configured as generic data types (uba_source_type="generic"). See Add custom data to Splunk UBA using the generic data source in Get Data into Splunk User Behavior Analytics.

For example, examine a raw event from a Windows event log in multiline format:

09/12/2019 04:42:00 PM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4672
EventType=0
Type=Information
ComputerName=acmel-lpt0399177
TaskCategory=Special Logon
OpCode=Info
RecordNumber=4660700202
Keywords=Audit Success
Message=Special privileges assigned to new logon.
Subject:
	Security ID:		acme\carbanak
	Account Name:		carbanak
	Account Domain:		acme
	Logon ID:		0x95455c63
Privileges:		SeTcbPrivilege
			SeDebugPrivilege
			SeLoadDriverPrivilege
			SeSecurityPrivilege
			SeBackupPrivilege

To extract the value of the EventCode, store it in the cube, and include the value in Splunk UBA content, begin by examining the windows events cube in Splunk UBA:

Make sure you are logged in to Splunk UBA as a user with content developer privileges.
Select System > Cubes.
In the URL, add ?system immediately following the host name or IP address. For example:
```
https://uba-001.example.com/?system#Y2FzcGlkYS5jdWJlcy5jdWJlRGV0YWlsc1ZpZXc=
```
Select the windowsevents cube to view its details.

At the top of the page, there is an attribute with the name eventId, an its attribute key is view.ad.eventId. This is the attribute key that must be provided when you are creating a new cube and want to store the value of this variable in the cube.

Names do not always coincide. The field name is EventCode in the raw event, but transformed to eventId in Splunk UBA.

Store generic events in Splunk UBA data cubes

Generic events are events from data sources for which Splunk UBA does not have any existing parsing logic. For example, you can ingest credit card transaction logs which are not CIM compliant and for which Splunk UBA does not parse by default. Events from this type of data source can be ingested into Splunk UBA as generic events and aggregated in data cubes. Splunk UBA can generate anomalies based on generic events stored in the data cubes as long as one of userId or deviceId are also present in the cube.

Suppose you have a data source called exampleDS and the field you want to store in the cube is called exampleDSfield. Use the following syntax to store the value of this field in the cube:

event.attribute#exampleDSfield

Use the view. methods described in Examine existing cubes to get more information about Splunk UBA data views only when Splunk UBA can map the attribute to one of the following defined in the table. These attributes contain identity resolution logic used for generating entities in Splunk UBA.

Entity you want to track in Splunk UBA	Your event contains this field	Mapped entity in Splunk UBA	What can be extracted by Splunk UBA
User	srcUser	srcUser	User name: `view..scrUser` User ID: `view..srcUser.uuid`
User	destUser	destUser	User name: `view..destUser` User ID: `view..destUser.uuid`
Source Device	sourceIp	source	Device name: `view..source` Device ID: `view..source.uuid` Device type: `view..source.deviceType` Device scope: `view..source.scope` Port: `view.*.source.port`
	httpClientIp	source
	clientIp	source
	endpointIp	source
Destination Device	destinationIp	destination	Device name: `view..destination` Device ID: `view..destination.uuid` Device type: `view..destination.deviceType` Device scope: `view..destination.scope` Port: `view.*.destination.port`
Destination Device	peeringHost	destination
Server Device	serverIp	server	Server name: `view.*.server`
Origin Device	origin	origin	Origin device ID: `view..origin.uuid` Origin device FQDN: `view..origin.fqdn`
Application	primaryApplication	application	Application name: `view..application` Application ID: `view..application.uuid`

For example, the parsed token view.*.source populates the source field in the cube with the device name as one of the dimensions. You can also extract additional device information such as the device ID, type, scope, and port.

Since the exampleDSfield attribute is not one of the ones listed in the table, it can't be matched by Splunk UBA. You must use the event.attribute format to store the value of exampleDSfield in the cube.

Related answers from Splunk Community

Understanding Splunk UBA data cubes

Example cube and descriptions

Examine existing cubes to get more information about Splunk UBA data views

Store generic events in Splunk UBA data cubes

Comments

Understanding Splunk UBA data cubes

Was this topic useful?