Understanding Splunk UBA data cubes
A data cube in Splunk UBA is a database table containing aggregated events based on specific attributes. Batch models in Splunk UBA use the data aggregated in data cubes to generate content such as anomalies and threats.
In this example, Splunk UBA ingests events from the Splunk platform using the Splunk Direct and Splunk Raw Events connectors. Events are recognized by Splunk UBA and tagged as belonging to a specific view. For example, an event from data source DS1 is tagged with _AD
as Splunk UBA recognizes this as Active Directory (AD) data.
For more information about the connectors and when to use which connector, see Use connectors to add data from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics.
Data cubes subscribe to one or more views and can track any number of attributes from those views. For example, if a view has 20 attributes, a data cube might track only 5 of the attributes of that view. In this example, Data Cube 1 is tracking three attributes from the AD view, and one attribute from the HTTP view.
Either the userId
or deviceId
attribute must be tracked in order for Splunk UBA to generate anomalies. Whichever attribute you choose is called the default entity ID. Without either one, anomalies are not raised even if the other attributes indicate otherwise.
Splunk UBA batch models use the data stored in a cube to generate content. Because each cube stores different types of data, a model must choose the appropriate cube in order to have the correct data for its content.
Example cube and descriptions
Consider the following example data cube. This cube tracks three attributes, which serve as the table columns: userId
, proccessPath
, and processName
. These attributes are called Dimensions.
A custom cube can have a maximum of six dimensions.
The cube can track unique combinations of these dimensions per hour, day, month, or year. A function can count the total number of occurrences for each combination per hour, day, month, or year. This function is called a Measure.
A custom cube can have a maximum of three measures.
As events are ingested into Splunk UBA, the desired attributes from each event populate the rows of the cube based on the refresh interval. For example, the cube may be updated every day, as shown in the following example:
day | userId | processPath | processName | COUNT ----+--------+-------------+-------------+------ 1 | user1 | /path1 | exe1 | 753 1 | user2 | /path2 | exe2 | 753 2 | user1 | /path1 | exe1 | 303 2 | user2 | /path2 | exe2 | 300 2 | user1 | /path2 | exe1 | 3 3 | user1 | /path1 | exe1 | 441 3 | user2 | /path2 | exe2 | 450
In this example, you can see the following behavior over three days of data:
- It is normal for
user1
to runexe1
onpath1
- It is normal for
user2
to runexe2
onpath2
On day 2, the event where user1
runs exe1
on path2
is not normal behavior and considered rare, so this event might cause an anomaly to be raised, depending on the specific threshold configured in the model that is consuming data from this cube. This particular pattern occurred 3 times out of 3000. If the model sets the rarity threshold to be 1 out of 1000, then an anomaly is raised for user1
.
A larger number of attributes can enhance the content generation in Splunk UBA as Splunk UBA can analyze events for more specific deviations or malicious behavior. Add an additional attribute to the cube so that deviceId
is tracked along with the other attributes.
day | userId | deviceId | processPath | processName | COUNT ----+--------+----------+-------------+-------------+------ 1 | user1 | device1 | /path1 | exe1 | 753 1 | user2 | device2 | /path2 | exe2 | 753 2 | user1 | device1 | /path1 | exe1 | 303 2 | user2 | device2 | /path2 | exe2 | 300 2 | user1 | device1 | /path2 | exe1 | 1 2 | user1 | device2 | /path2 | exe1 | 1 2 | user1 | device3 | /path2 | exe1 | 1 3 | user1 | device1 | /path1 | exe1 | 441 3 | user2 | device2 | /path2 | exe2 | 450
Using the same data with the additional dimension of the deviceId
attribute, you can see the following behavior patterns:
- It is normal for
user1
to runexe1
onpath1
ondevice1
- It is normal for
user2
to runexe2
onpath2
ondevice2
On day 2, there are now three separate entries for user1
running exe1
on path2
on three different devices. In this case, three separate anomalies are raised.
By default, Splunk UBA data cubes retain 30 days worth of data. You can configure this retention period to be larger if desired, provided you allocate enough disk space for the data storage.
Examine existing cubes to get more information about Splunk UBA data views
Each attribute in a view has a unique attribute key associated with it. Splunk UBA uses attribute keys to extract the corresponding value of an attribute from the data event's view or from the data event directly.
You must provide specific attribute keys when you create a custom cube, so examining the content of existing cubes is a necessary prerequisite to building custom cubes. Use the following syntax formats to specify attribute keys:
Syntax | Description and example |
---|---|
view.<viewname>.<object/method> | Use this format to extract the value of the <method/object> from only the specified view. For example:
|
view.*.<object/method> | Use this format to extract the value of the <method/object> from all views. For example:
|
event.<objectname> | Use this format to extract the value of the <objectname> directly from a generic source type event. For example event.datasourceId extracts the value of the data source ID directly from the event.
|
event.attribute#<attributename> | Use this format to extract the value of the <attributename> directly from a generic source type event. For example, event.attribute#vendor tracks the value of the vendor for a specific event.
|
Use the view
formats to aggregate the values from the following types of events:
- Events from CIM-compliant data sources ingested using Splunk Direct. See Add CIM-compliant data from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics.
- Events from data sources ingested using Splunk Raw Events and Splunk UBA's native parsers. See Add raw events from the Splunk platform to Splunk UBA in Get Data into Splunk User Behavior Analytics.
Use the event
formats to aggregate the values from events whose data sources are configured as generic data types (uba_source_type="generic"
). See Add custom data to Splunk UBA using the generic data source in Get Data into Splunk User Behavior Analytics.
For example, examine a raw event from a Windows event log in multiline format:
09/12/2019 04:42:00 PM LogName=Security SourceName=Microsoft Windows security auditing. EventCode=4672 EventType=0 Type=Information ComputerName=acmel-lpt0399177 TaskCategory=Special Logon OpCode=Info RecordNumber=4660700202 Keywords=Audit Success Message=Special privileges assigned to new logon. Subject: Security ID: acme\carbanak Account Name: carbanak Account Domain: acme Logon ID: 0x95455c63 Privileges: SeTcbPrivilege SeDebugPrivilege SeLoadDriverPrivilege SeSecurityPrivilege SeBackupPrivilege
To extract the value of the EventCode
, store it in the cube, and include the value in Splunk UBA content, begin by examining the windows events
cube in Splunk UBA:
- Make sure you are logged in to Splunk UBA as a user with content developer privileges.
- Select System > Cubes.
- In the URL, add
?system
immediately following the host name or IP address. For example:https://uba-001.example.com/?system#Y2FzcGlkYS5jdWJlcy5jdWJlRGV0YWlsc1ZpZXc=
- Select the
windowsevents
cube to view its details.
At the top of the page, there is an attribute with the name eventId
, an its attribute key is view.ad.eventId
. This is the attribute key that must be provided when you are creating a new cube and want to store the value of this variable in the cube.
Names do not always coincide. The field name is EventCode
in the raw event, but transformed to eventId
in Splunk UBA.
Store generic events in Splunk UBA data cubes
Generic events are events from data sources for which Splunk UBA does not have any existing parsing logic. For example, you can ingest credit card transaction logs which are not CIM compliant and for which Splunk UBA does not parse by default. Events from this type of data source can be ingested into Splunk UBA as generic events and aggregated in data cubes. Splunk UBA can generate anomalies based on generic events stored in the data cubes as long as one of userId
or deviceId
are also present in the cube.
Suppose you have a data source called exampleDS and the field you want to store in the cube is called exampleDSfield. Use the following syntax to store the value of this field in the cube:
event.attribute#exampleDSfield
Use the view.
methods described in Examine existing cubes to get more information about Splunk UBA data views only when Splunk UBA can map the attribute to one of the following defined in the table. These attributes contain identity resolution logic used for generating entities in Splunk UBA.
Entity you want to track in Splunk UBA | Your event contains this field | Mapped entity in Splunk UBA | What can be extracted by Splunk UBA |
---|---|---|---|
User | srcUser | srcUser |
|
destUser | destUser |
| |
Source Device | sourceIp | source |
|
httpClientIp | source | ||
clientIp | source | ||
endpointIp | source | ||
Destination Device | destinationIp | destination |
|
peeringHost | destination | ||
Server Device | serverIp | server |
|
Origin Device | origin | origin |
|
Application | primaryApplication | application |
|
For example, the parsed token view.*.source
populates the source
field in the cube with the device name as one of the dimensions. You can also extract additional device information such as the device ID, type, scope, and port.
Since the exampleDSfield attribute is not one of the ones listed in the table, it can't be matched by Splunk UBA. You must use the event.attribute
format to store the value of exampleDSfield in the cube.
What is the custom use case framework? | Create a new data cube |
This documentation applies to the following versions of Splunk® User Behavior Analytics: 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.4.1, 5.0.5, 5.0.5.1, 5.1.0, 5.1.0.1, 5.2.0, 5.2.1, 5.3.0, 5.4.0, 5.4.1
Feedback submitted, thanks!