Splunk® Enterprise Security

Administer Splunk Enterprise Security

This documentation does not apply to the most recent version of Splunk® Enterprise Security. For documentation on the most recent version, go to the latest release.

Convert Extreme Searches to Machine Learning Toolkit in Splunk Enterprise Security

If you need to convert any locally modified XS searches to MLTK, use the following information to help guide your decisions.

Converting XS commands

The most common common XS commands that have MLTK equivalents in ES follow.

xsWhere

The xsWhere command is approximately equivalent to the `mltk_apply` macro. These apply data to a model, compare against thresholds, and find outliers for a field. For each value, given the provided threshold, the macros tell you if the value is an outlier. See Abnormally High Number of HTTP Method Events By Src - Rule in DA-ESS-NetworkProtection.

xsFindBestConcept

The xsFindBestConcept command is approximately equivalent to the `mltk_findbest` macro. They are almost the opposite of the xsWhere and applycommands. For each value, these tell you in which threshold range the value falls on the distribution curve. For example: the high range is between 0.05 - 0.01, and the extreme range is between 0.01 - 0.000000001. See Access - Total Access Attempts in DA-ESS-AccessProtection.

xsCreateDDContext

The xsCreateDDContext command is approximately equivalent to the fit command. These both generate a new model each time the search is run. See Access - Authentication Failures By Source in SA-AccessProtection

xsUpdateDDContext

Each time this is run, it will combine the new training with the existing model. There is no xsUpdateDDContext equivalent in MLTK at this time. There are no models/contexts that are updated additively. All model-generation searches wipe out the old model and produce a new model based on the data retrieved in the dispatch window.

To accommodate this change, the dispatch times of the Model Gen searches that were converted from xsUpdateDDContext XS searches have been increased to generate the model from more data, to get more reliable models.

Converting a Context Gen Search

As an example of converting a context gen search, consider Access - Authentication Failures By Source - Context Gen as three lines.

Line SPL
1.
| tstats `summariesonly` count as failures from datamodel=Authentication.Authentication where Authentication.action="failure" by Authentication.src,_time span=1h
2.
| stats median(failures) as median, min(failures) as min, count as count | eval max = median*2
3.
| xsUpdateDDContext app="SA-AccessProtection" name=failures_by_src_count_1h container=authentication scope=app | stats count

Line one
Line one starts by counting the authentication failures per hour:
| tstats `summariesonly` count as failures from datamodel=Authentication.Authentication where Authentication.action="failure" by Authentication.src,_time span=1h.

Line two
Line two contains stats median(failures) as median, min(failures) as min, count as count | eval max = median*2, which is putting the results of the search into the input format that the XS xsUpdateDDContext command requires. In some searches you see the macro `context_stats` used instead, such as `context_stats(web_event_count, http_method)`.

Line three
Line three uses the XS xsUpdateDDContext command to build a data-defined historical view context, puts it in an app context, gives it a name, assigns a container, and a scope.

Consider the MLTK version of the search is Access - Authentication Failures By Source - Model Gen as two lines.

Line SPL
1.
| tstats `summariesonly` count as failure from datamodel=Authentication.Authentication where Authentication.action="failure" by Authentication.src,_time span=1h
2.
 | fit DensityFunction failure partial_fit=true dist=norm into app:failures_by_src_count_1h

The steps for converting this search from a context gen search to a model gen search follow:

  1. Line one starts the same way for both searches, by counting the authentication failures per hour. Keep this when converting to MLTK.
  2. The fit command takes tables as inputs, thus it is not necessary to include
    | stats median(failures) as median, min(failures) as min, count as count | eval max = median*2
  3. In line two for the MLTK version of the search, do the following:
    1. Replace the XS command xsUpdateDDContext with the approximate equivalent of fit DensityFunction.
    2. Include the failure field that you're counting in the first part of the search.
    3. Include the partial_fit=true parameter to update the existing models with new data rather than building completely new models.
    4. Add the dist=norm to represent the normal distribution bell curve of the density function.
    5. Use into for passing the data into the model.
    6. Keep the name from the original search because it is also the model name for MLTK.
      1. All MLTK model names should include the app: prefix, which properly saves the model into the shared application namespace.
      2. In this example, append it to the name "failures_by_src_count_1h" so that it resembles app:failures_by_src_count_1h.

Converting a Correlation Search

As an example of converting a correlation search, consider Access - Brute Force Access Behavior Detected - Rule as four lines.

Line SPL
1.
| from datamodel:"Authentication"."Authentication"
2.
| stats values(tag) as tag,values(app) as app,count(eval('action'=="failure")) as failure,count(eval('action'=="success")) as success by src
3.
| search success>0
4.
| xswhere failure from failures_by_src_count_1h in authentication is above medium

Line one
Line one starts by searching the authentication data model:
| from datamodel:"Authentication"."Authentication"

Line two
Line two contains | stats values(tag) as tag,values(app) as app,count(eval('action'=="failure")) as failure,count(eval('action'=="success")) as success by src, which is counting authentication failures followed by success.

Line three
Line three searches for successes greater than 0.

Line four
Line four uses the XS xswhere command to match a concept within a specified context and determine compatibility, in this case authentication is above medium.


Consider the MLTK version of the search Access - Brute Force Access Behavior Detected - Rule as four lines.

Line SPL
1.
| from datamodel:"Authentication"."Authentication"
2.
| stats values(tag) as tag,values(app) as app,count(eval('action'=="failure")) as failure,count(eval('action'=="success")) as success by src
3.
| search success>0
4.
`mltk_apply_upper("app:failures_by_src_count_1h", "medium", "failure")`

The steps for converting this search to MLTK:

  1. Keep line 1 as-is.
  2. Keep line 2 as-is.
  3. Keep line 3 as-is.
  4. In line four, do the following:
    1. Replace the XS command xswhere with the approximate equivalent of the `mltk_apply_upper` macro.
      1. The macro wraps the MLTK apply function and filters the results based on whether the values are above or below a certain threshold.
    2. Include the argument for the model name app:failures_by_src_count_1h from the model gen search that builds the model.
    3. Include the argument for the qualitative_id of medium.
    4. Include the argument for the failure field that you're counting in the first part of the search.


Converting a Key Indicator Search

To convert a Key Indicator search to use MLTK, you have to first convert the corresponding Model Gen search. The Key Indicator search references the ML model name created by the Model Gen search.


As an example of converting a correlation search, consider Risk - Median Risk Score as seven lines.

Line SPL
1.
| tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-24h@h latest=+0s by All_Risk.risk_object | stats median(accum_risk) as current_count | appendcols
2.
[| tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-48h@h latest=-24h@h by All_Risk.risk_object | stats median(accum_risk) as historical_count] 
3.
| `get_ksi_fields(current_count, historical_count)`
4.
| xsfindbestconcept current_count FROM median_object_risk_by_object_type_1d IN risk as current_count_qual
5.
| xsfindbestconcept delta FROM percentile in default as delta_qual

Line one
Line one starts by searching for data from the current day.

Line two
Line two starts by searching data from the previous day.

Line three
Line three calculates the delta as a percentage between current_count and historical_count (today's value and yesterday's value). So if yesterday's value was 100 and today's is 125, then the delta = 25% and the direction = increasing.

Line four
Line four evaluates the statistics counts.

Line five
Line five finds the delta percentage for the key indicator in the risk analysis dashboard.


Converting Risk - Median Risk Score to MLTK.

Line SPL
1.
| tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-24h@h latest=+0s by All_Risk.risk_object | stats median(accum_risk) as current_count | appendcols
2.
[| tstats `summariesonly` sum(All_Risk.risk_score) as accum_risk from datamodel=Risk.All_Risk where earliest=-48h@h latest=-24h@h by All_Risk.risk_object | stats median(accum_risk) as historical_count]
3.
| `get_ksi_fields(current_count, historical_count)`
4.
| `mltk_findbest("app:median_object_risk_by_object_type_1d")`
5.
| `get_percentage_qualitative(delta, delta_qual)`

Lines one through three remain as-is. The last two lines are replaced with the MLTK equivalent:

  1. In line four, replace the xsfindbestconcept current_count with the approximate equivalent of `mltk_findbest` macro. This is a macro that wraps the MLTK apply function. For each value, this macro tells you in which threshold range the value falls on the distribution curve. Notice that this model doesn't need a field name for a specific field that you're applying it on. This is because the field is determined during the fit, so you only need to make sure that the field exists in the results when doing the apply.
  2. In line five, replace the xsfindbestconcept delta with the approximate equivalent of the `get_percentage_qualitative` macro. This applies a qualitative term to the delta between the current count and the historical count, such as extremely, moderately, greatly. You will see these as indicators in the risk analysis dashboard.

You cannot rename current_count, as this is expected.

Last modified on 19 January, 2022
Machine Learning Toolkit Macros in Splunk Enterprise Security   Machine Learning Toolkit Troubleshooting in Splunk Enterprise Security

This documentation applies to the following versions of Splunk® Enterprise Security: 7.0.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters