Splunk® Machine Learning Toolkit

ML-SPL API Guide

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Write a Python algorithm class

To add a custom algorithm to the Splunk Machine Learning Toolkit (MLTK) you must write a Python algorithm class. The algorithm class must implement certain methods to properly operate with upstream processes. These methods are the entry points to an algorithm, where the data and options are specified as arguments.

Best practices

Follow these best practices when writing an algorithm class:

  • Assume invalid input.
  • If there is a parameter passed in make sure you check that it is valid.
  • If you require a particular field, for example, _time, make sure you check for its presence and error accordingly.

Methods

Methods are the entry point to the custom algorithm. Refer to the following table for details about each method:

Method Required Arguments
__init__ Yes self, options
fit Yes self, df, options
apply Only for saved models self, df, options
register_codecs Only for saved models (none)
partial_fit No self, df, options
summary No self, options

Arguments

Specify data and options as arguments. Refer to the following table for details about each argument:

Argument Description
options Options include:
  • args (list): A list of the fields used.
  • params (dict): Any parameters (key-value) pairs in the search.
  • feature_variables (list): The fields to be used as features.
  • target_variable (list): The target field for prediction.
  • algo_name (str): The name of algorithm.
  • mlspl_limits (dict): mlspl.conf stanza properties that may be used in utility methods.

Other options that may exist depending on the search:

  • model_name (str): The name of the model being saved (into clause).
  • output_name (str): The name of the output (as clause).
df A pandas DataFrame of the input data from the search results. See

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html.

Example:

The following example is a dictionary of information from the search:

{
 	 'args': [u'sepal_width', u'petal*'],
	 'params': {u'fit_intercept': u't'},
	 'feature_variables': ['petal*'],
	 'target_variable': ['sepal_width']
	 'algo_name': u'LinearRegression',
         'mlspl_limits': { ... },
 }

Attributes

Inside of the fit method, two attributes can be attached to self by the search command. Refer to the following table for details about each attribute:

Attribute Description
self.feature_variables (list) The wildcard matched list of fields present from the search
self.target_variable (str) The name of the target field. This field is only present if the from clause is used.
Last modified on 06 February, 2024
PREVIOUS
Register an algorithm in the Machine Learning Toolkit
  NEXT
Custom algorithm template

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.4.0, 4.4.1, 4.4.2, 4.5.0, 5.0.0, 5.1.0, 5.2.0, 5.2.1, 5.2.2, 5.3.0, 5.3.1, 5.3.3, 5.4.0, 5.4.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters