Convert Extreme Searches to Machine Learning Toolkit in Splunk Enterprise Security
If you need to convert any locally modified XS searches to MLTK, use the following information to help guide your decisions.
Converting XS commands
The most common common XS commands that have MLTK equivalents in ES follow.
xsWhere
The xsWhere
command is approximately equivalent to the `mltk_apply`
macro. These apply data to a model, compare against thresholds, and find outliers for a field. For each value, given the provided threshold, the macros tell you if the value is an outlier. See Abnormally High Number of HTTP Method Events By Src - Rule in DA-ESS-NetworkProtection.
xsFindBestConcept
The xsFindBestConcept
command is approximately equivalent to the `mltk_findbest`
macro. They are almost the opposite of the xsWhere
and apply
commands. For each value, these tell you in which threshold range the value falls on the distribution curve. For example: the high range is between 0.05 - 0.01, and the extreme range is between 0.01 - 0.000000001. See Access - Total Access Attempts in DA-ESS-AccessProtection.
xsCreateDDContext
The xsCreateDDContext
command is approximately equivalent to the fit
command. These both generate a new model each time the search is run. See Access - Authentication Failures By Source in SA-AccessProtection
xsUpdateDDContext
Each time this is run, it will combine the new training with the existing model. There is no xsUpdateDDContext
equivalent in MLTK at this time. There are no models/contexts that are updated additively. All model-generation searches wipe out the old model and produce a new model based on the data retrieved in the dispatch window.
To accommodate this change, the dispatch times of the Model Gen searches that were converted from xsUpdateDDContext
XS searches have been increased to generate the model from more data, to get more reliable models.
Converting a Context Gen Search
As an example of converting a context gen search, consider Access - Authentication Failures By Source - Context Gen as three lines.
Line | SPL |
---|---|
1. | | tstats `summariesonly` count as failures from datamodel=Authentication.Authentication where Authentication.action="failure" by Authentication.src,_time span=1h |
2. | | stats median(failures) as median, min(failures) as min, count as count | eval max = median*2 |
3. | | xsUpdateDDContext app="SA-AccessProtection" name=failures_by_src_count_1h container=authentication scope=app | stats count |
Line one
Line one starts by counting the authentication failures per hour:
| tstats `summariesonly` count as failures from datamodel=Authentication.Authentication where Authentication.action="failure" by Authentication.src,_time span=1h
.
Line two
Line two contains stats median(failures) as median, min(failures) as min, count as count | eval max = median*2
, which is putting the results of the search into the input format that the XS xsUpdateDDContext
command requires. In some searches you see the macro `context_stats`
used instead, such as `context_stats(web_event_count, http_method)`
.
Line three
Line three uses the XS xsUpdateDDContext
command to build a data-defined historical view context, puts it in an app
context, gives it a name
, assigns a container
, and a scope
.
Consider the MLTK version of the search is Access - Authentication Failures By Source - Model Gen as two lines.
Line | SPL |
---|---|
1. | | tstats `summariesonly` count as failure from datamodel=Authentication.Authentication where Authentication.action="failure" by Authentication.src,_time span=1h |
2. | | fit DensityFunction failure partial_fit=true dist=norm into app:failures_by_src_count_1h |
The steps for converting this search from a context gen search to a model gen search follow:
- Line one starts the same way for both searches, by counting the authentication failures per hour. Keep this when converting to MLTK.
- The fit command takes tables as inputs, thus it is not necessary to include
| stats median(failures) as median, min(failures) as min, count as count | eval max = median*2
- In line two for the MLTK version of the search, do the following:
- Replace the XS command
xsUpdateDDContext
with the approximate equivalent offit DensityFunction
. - Include the
failure
field that you're counting in the first part of the search. - Include the
partial_fit=true
parameter to update the existing models with new data rather than building completely new models. - Add the
dist=norm
to represent the normal distribution bell curve of the density function. - Use
into
for passing the data into the model. - Keep the
name
from the original search because it is also the model name for MLTK.- All MLTK model names should include the
app:
prefix, which properly saves the model into the shared application namespace. - In this example, append it to the name "failures_by_src_count_1h" so that it resembles
app:failures_by_src_count_1h
.
- All MLTK model names should include the
- Replace the XS command
Converting a Correlation Search
As an example of converting a correlation search, consider Access - Brute Force Access Behavior Detected - Rule as four lines.
Line | SPL |
---|---|
1. | | from datamodel:"Authentication"."Authentication" |
2. | | stats values(tag) as tag,values(app) as app,count(eval('action'=="failure")) as failure,count(eval('action'=="success")) as success by src |
3. | | search success>0 |
4. | | xswhere failure from failures_by_src_count_1h in authentication is above medium |
Line one
Line one starts by searching the authentication data model:
| from datamodel:"Authentication"."Authentication"
Line two
Line two contains | stats values(tag) as tag,values(app) as app,count(eval('action'=="failure")) as failure,count(eval('action'=="success")) as success by src
, which is counting authentication failures followed by success.
Line three
Line three searches for successes greater than 0.
Line four
Line four uses the XS xswhere
command to match a concept within a specified context and determine compatibility, in this case authentication is above medium
.
Consider the MLTK version of the search Access - Brute Force Access Behavior Detected - Rule as four lines.
Line | SPL |
---|---|
1. | | from datamodel:"Authentication"."Authentication" |
2. | | stats values(tag) as tag,values(app) as app,count(eval('action'=="failure")) as failure,count(eval('action'=="success")) as success by src |
3. | | search success>0 |
4. | `mltk_apply_upper("app:failures_by_src_count_1h", "medium", "failure")` |
The steps for converting this search to MLTK:
- Keep line 1 as-is.
- Keep line 2 as-is.
- Keep line 3 as-is.
- In line four, do the following:
- Replace the XS command
xswhere
with the approximate equivalent of the`mltk_apply_upper`
macro.- The macro wraps the MLTK
apply
function and filters the results based on whether the values are above or below a certain threshold.
- The macro wraps the MLTK
- Include the argument for the model name
app:failures_by_src_count_1h
from the model gen search that builds the model. - Include the argument for the qualitative_id of
medium
. - Include the argument for the
failure
field that you're counting in the first part of the search.
- Replace the XS command
Converting a Key Indicator Search
To convert a Key Indicator search to use MLTK, you have to first convert the corresponding Model Gen search. The Key Indicator search references the ML model name created by the Model Gen search.
As an example of converting a correlation search, consider Risk - Median Risk Score as seven lines.
Line | SPL |
---|---|
1. | | tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-24h@h latest=+0s by All_Risk.risk_object | stats median(accum_risk) as current_count | appendcols |
2. | [| tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-48h@h latest=-24h@h by All_Risk.risk_object | stats median(accum_risk) as historical_count] |
3. | | `get_ksi_fields(current_count, historical_count)` |
4. | | xsfindbestconcept current_count FROM median_object_risk_by_object_type_1d IN risk as current_count_qual |
5. | | xsfindbestconcept delta FROM percentile in default as delta_qual |
Line one
Line one starts by searching for data from the current day.
Line two
Line two starts by searching data from the previous day.
Line three
Line three calculates the delta as a percentage between current_count and historical_count (today's value and yesterday's value). So if yesterday's value was 100 and today's is 125, then the delta = 25% and the direction = increasing.
Line four
Line four evaluates the statistics counts.
Line five
Line five finds the delta percentage for the key indicator in the risk analysis dashboard.
Converting Risk - Median Risk Score to MLTK.
Line | SPL |
---|---|
1. | | tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-24h@h latest=+0s by All_Risk.risk_object | stats median(accum_risk) as current_count | appendcols |
2. | [| tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-48h@h latest=-24h@h by All_Risk.risk_object | stats median(accum_risk) as historical_count] |
3. | | `get_ksi_fields(current_count, historical_count)` |
4. | | `mltk_findbest("app:median_object_risk_by_object_type_1d")` |
5. | | `get_percentage_qualitative(delta, delta_qual)` |
Lines one through three remain as-is. The last two lines are replaced with the MLTK equivalent:
- In line four, replace the
xsfindbestconcept current_count
with the approximate equivalent of`mltk_findbest`
macro. This is a macro that wraps the MLTKapply
function. For each value, this macro tells you in which threshold range the value falls on the distribution curve. Notice that this model doesn't need a field name for a specific field that you're applying it on. This is because the field is determined during thefit
, so you only need to make sure that the field exists in the results when doing theapply
. - In line five, replace the
xsfindbestconcept delta
with the approximate equivalent of the`get_percentage_qualitative`
macro. This applies a qualitative term to the delta between the current count and the historical count, such as extremely, moderately, greatly. You will see these as indicators in the risk analysis dashboard.
You cannot rename current_count, as this is expected.
Machine Learning Toolkit Macros in Splunk Enterprise Security | Machine Learning Toolkit Troubleshooting in Splunk Enterprise Security |
This documentation applies to the following versions of Splunk® Enterprise Security: 8.0.0, 8.0.1, 8.0.2
Feedback submitted, thanks!