Batch models

Batch models and their associated anomaly rules operate on accumulated data stored in the UBA analytical store. Batch models analyze ingested data over a larger time window, such as the last 24 hours. Batch models typically run overnight due to the need to process large amounts of data.

The following is a selection of the batch models available in Splunk UBA:

To learn about the batch models of Lateral Movement, VPN, and Time-series, see Lateral Movement model, VPN login related anomaly detection models, and Time-series models.

To learn more about data mapping, see Data mapping for model-based anomalies in Splunk UBA and Data mapping for rule-based anomalies in Splunk UBA in the Get Data into Splunk User Behavior Analytics manual.

Excessive File Size Change Model

This model discovers data collection practices in cloud file stores that precede data exfiltration. Specifically, the model identifies files that grow unexpectedly as a byproduct of data collection. Examples of this include significant copy-pasting of data to an exfiltration file, or collecting files for exfiltration to an archive or zip file.

The model tracks and qualifies jumps in sizes of files over time.The model separately qualifies size jumps for three classes of files - small, regular, and large sized files. Each class has a very different size increase property.

The model establishes a number of additional features related to individual instances of file size jumps, as well as averages and extremes of these jumps. A number of model features are developed based on folders in which a file can be found, and the users and sharing practices related to all copies of the observed file and its folders. This large assembly of features helps the model distinguish between benign and suspicious file size jumps.

The model is configurable and can be set to analyze data from all file-related data sources together, to focus on each individual data source, or to exclude some data sources. All pre-trained features can be specifically adapted to each use case.

New Access Model for Box

This model identifies suspicious first time accesses to files in cloud file stores. This access includes any user activity or event type on a file such as previewing, moving, syncing, and deleting. The model qualifies whether an access is suspicious or not based on the user history of file accesses, as well as from file accesses of other similarly behaving users, and from the properties of files themselves.

The model acts as a recommender system that can also accept partial external guidance from human expert rules. Some of the components of the recommender system include constructing an access graph which associates users to files, enriching the graph's edges with features describing access properties, adding some of the features to represent human expert rules, running matrix factorization on that graph, and letting the resulting recommender system identify which first time accesses that actually occurred align with the ones predicted by the algorithm/model.

The model does not come pre-trained but re-trains the recommender system on live user data. This training may be processing prohibitive at scale. To reduce training costs parts of the training run optionally or rarely. Hyper parameter tuning of the underlying recommender system is done once a month by default, and k-fold validation of the training process optional. The only component of the training that is continually done is training of the recommender system on user data.

The model offers many parameters that you can configure in the Model Registry including which file extensions are considered risky, which user actions are considered risky, and whether the model includes information about file access activities of the user's peers.

Rare File Access Model

This model looks for rare accesses to files by individual users. Rarity of a file access is established by tracking a number of features related to users, types of access activities, services involved in the file accesses, and a potential involvement of other users in these accesses - such as in collusion or impersonation. The model establishes that a file access is rare, and also scores the risk of a rare access. The more observed features exhibit rare behavior, the more certain that an access is risky.

This model tracks several features including rarity of accesses to each file, rarity of specific access actions such as reading, editing, or deleting, as well as rarity of applications that facilitate the access such as access using a Chrome browser, Word, Excel, or Acrobat Reader. The model also tracks many conditional probability features. For example, given an application, how rare is a specific access action, given an action how rare is access application, given a user how rare is an access action or application for that user, and given a user who initiated the access, how rare is the user to whom the file was sent or shared to.

Other tracking features can be added in the Model Registry to further separate benign from suspicious instances of rare accesses to files. Model Registry also offers configurable parameters by which you can focus the model to look into specific kinds of raw events, specific devices, and users.

Rare Microsoft Windows Device Access Model Using Authentication Data

This model looks for user authentications to various devices using Active Directory or Windows events. The authentication can be an explicit user login or a resource-access authentication such as access to web pages on a web server. Rarity of a user visiting a device is established by tracking a number of features related to both users and devices individually, as well as by user-device associations. The more observed features exhibit rare behavior, the more certain that the association is risky.

The model can optionally include behaviors of a user and/or device peer group. The model establishes that a user-device association is rare, and evaluates/scores the risk of a suspicious association. The model also performs alert suppression. If more than a pre-configured number of users or devices share the same rarity features, the alert is not raised.

The model is configurable such that other tracking features can be added in the Model Registry. Model Registry also offers configurable parameters by which you can focus the model to look into specific kinds of raw events, specific devices, and users.

Rare Microsoft Windows Device Access Model Using Login Data

This model looks for explicit user logins to devices that happen on a Windows login screen. Rarity of a user login to a device is established by tracking a number of features related to both users and devices individually, as well as by their associations. The model optionally includes user and/or device peer group behaviors. The more observed features exhibit rare behavior, the more certain that the association is risky.

The model establishes that a user-device association is rare, and also evaluates/scores the risk of the rare association. The model also performs alert suppression. If more than a pre-configured number of users or devices share the same rarity features, the alert is not raised.

This model tracks individual features, such as the total number of logins for a given user. The model also tracks many conditional probability features. For example, given a device how rare is the presence of a given user out of all users, given a user how rare is their presence on a given device given all devices on which the user appears, given a device how rare is the login type the user used, given a user how rare is the employed login type in general, given a login process how rare is the code that invoked it, and given a user how rare is the the domain of the device to which the user is login in.

The model is configurable, such that other tracking features can be added in the Model Registry. Model Registry also contains configurable parameters by which you can focus the model to look into specific kinds of raw events, specific devices, and users.

Rare VPN Login Location Model

This model looks into users accessing the company from rare VPN locations. Locations are reported as countries. Suspicious rarity is determined by tracking a number of features related to users, devices, and the source countries from which users initiated VPN connections. The model establishes that access is from a rare country, and scores the related risk. The more observed features exhibit rare behavior, the more certain that the access is risky.

This model tracks the rarity of a source country in general, as well as several conditional features. Conditional features include rare source countries for a given user, rare source countries for a given device, and rare source countries for a given department or peer group in the company. Other features of interest for rarity and noise suppression can be added in the Model Registry. Model Registry also contains configurable parameters by which you can focus the model to look into specific kinds of raw events, specific devices, and users.

Unusual Volume of Box Downloads per User Model

This model tracks various outliers in time series data. For example, the model can identify anomalies in the volume of downloaded data, or the sum of downloaded file sizes, which can indicate data exfiltration.

Outliers are notable offsets from: baseline estimates. For example, moving averages of estimated distributions, offsets from percentiles of estimated distributions, or offsets from hard thresholds established by some specific domain expertise. In this instance, the baseline is a Gaussian moving average with extreme values removed.

The model derives conclusions from several sources of reference: the history of the observed entity (user, device, etc.), the history of peer groups of the observed entity, and from the global behavior of all entities in the company. Time wise, the model looks into daily behaviors such as changes in daily statistics, and weekly behaviors such as offsets from daily statistics from weekly baseline.

You can use the model to spot outliers across many dimensions including instantaneous values, distributions, different time scales, personal behaviors, peer group behaviors, and global company behaviors. Each offset is scored. The more score is collected, the more suspicious the outlier is.

Unusual Volume of Box Login Failure Events per User Model

This model tracks various outliers in time series data. The model identifies outliers in counts of triggered events. These anomalies are indicators of a foundational change in user or device activity, which are often security related. Outliers in this model are notable offsets from baseline estimates such as moving averages, offsets from percentiles of the estimated distributions, or offsets from hard thresholds established by some specific domain expertise.

The model derives conclusions from several sources of reference including the history of the observed entity (user, device.), the history of peer groups of the observed entity, and from the global behavior of all entities in the company. Time wise, the model looks into daily behaviors such as changes in daily statistics, and weekly behaviors such as offsets of daily statistics from weekly baselines.

You can use this model to spot outliers across many dimensions including instantaneous values, distributions, different time scales, personal behaviors, peer group behaviors, and global company behaviors. Each offset is scored. The more score is collected, the more suspicious the outlier is.

Unusual Volume of VPN Login Events per User Model

This model tracks various outliers in time series data that counts the VPN login events of individual users. Outliers in this model are notable offsets from baselines estimates such as moving averages of estimated distributions, offsets from percentiles of estimated distributions, or offsets from hard thresholds established by some specific domain expertise.

The model derives conclusions from several sources of reference including the history of the observed entity (user, device), the history of peer groups of the observed entity, and from the global behavior of all entities in the company. Time wise, the model looks into daily behaviors such as changes in daily statistics, and weekly behaviors such as offsets from daily statistics from weekly baselines.

You can use this model to spot outliers across many dimensions including instantaneous values, distributions, different time scales, personal behaviors, peer group behaviors, and global company behaviors. Each offset is scored. The more score is collected, the more suspicious the outlier is.

Unusual Volume of VPN Traffic per User Model

This model tracks outliers in various kinds of time series data for the volume of transmitted data through VPN sessions for individual users. Outliers are notable offsets from baseline estimates such as moving averages of estimated distributions, offsets from percentiles of estimated distributions, or offsets from hard thresholds established by some specific domain expertise.

The model derives conclusions from several sources of reference including the history of the observed entity (user, device), the history of peer groups of the observed entity, and from the global behavior of all entities in the company. Time wise, the model looks into daily behaviors such as changes in daily statistics, and weekly behaviors such as offsets of daily statistics from weekly baselines.

You can use this model to spot outliers across many dimensions including instantaneous values, distributions, different time scales, personal behaviors, peer group behaviors, and global company behaviors. Each offset is scored. The more score is collected, the more suspicious the outlier is.

Related answers from Splunk Community

Batch models

Excessive File Size Change Model

New Access Model for Box

Rare File Access Model

Rare Microsoft Windows Device Access Model Using Authentication Data

Rare Microsoft Windows Device Access Model Using Login Data

Rare VPN Login Location Model

Unusual Volume of Box Downloads per User Model

Unusual Volume of Box Login Failure Events per User Model

Unusual Volume of VPN Login Events per User Model

Unusual Volume of VPN Traffic per User Model

Comments

Batch models

Was this topic useful?