Splunk® Machine Learning Toolkit

ML-SPL API Guide

Download manual as PDF

Download topic as PDF

Write a Python algorithm class

The algorithm class must implement certain methods to operate with upstream processes. These methods are the entry points to an algorithm, where the data and options are specified as arguments.

Best practices

Follow these best practices when writing algorithms:

  • Assume invalid input.
  • If there is a parameter passed in make sure you check that it is valid.
  • If you require a particular field, for example, _time, make sure you check for its presence and error accordingly.

Methods

Methods are the entry point to the custom algorithm.

Method Required Arguments
__init__ Yes self, options
fit Yes self, df, options
apply Only for saved models self, df, options
register_codecs Only for saved models (none)
partial_fit No self, df, options
summary No self, options

Arguments

Specify data and options as arguments.

Argument Description
options Options include:
  • args (list): A list of the fields used.
  • params (dict): Any parameters (key-value) pairs in the search.
  • feature_variables (list): The fields to be used as features.
  • target_variable (list): The target field for prediction.
  • algo_name (str): The name of algorithm.
  • mlspl_limits (dict): mlspl.conf stanza properties that may be used in utility methods.

Other options that may exist depending on the search:

  • model_name (str): The name of the model being saved (into clause).
  • output_name (str): The name of the output (as clause).
df A pandas DataFrame of the input data from the search results. See

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html.

Example:

The following example is a dictionary of information from the search:

{
 	 'args': [u'sepal_width', u'petal*'],
	 'params': {u'fit_intercept': u't'},
	 'feature_variables': ['petal*'],
	 'target_variable': ['sepal_width']
	 'algo_name': u'LinearRegression',
         'mlspl_limits': { ... },
 }

Attributes

Inside of the fit method, two attributes can be attached to self by the search command.

Attribute Description
self.feature_variables (list) The wildcard matched list of fields present from the search
self.target_variable (str) The name of the target field. This field is only present if the from clause is used.
PREVIOUS
Register an algorithm in the Machine Learning Toolkit
  NEXT
Custom algorithm template

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 4.4.0, 4.4.1


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters