Splunk® Machine Learning Toolkit

ML-SPL API Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Write an algorithm class

The algorithm class must implement certain methods to operate with upstream processes. These methods are the entry points to an algorithm, where the data and options are specified as arguments.

Methods

Method Required Arguments
__init__ yes self, options
fit yes self, df, options
apply only for saved models self, df, options
register_codecs only for saved models (none)
partial_fit no self, df, options
summary no self, options

Arguments

Argument Description
options A dictionary of information from the search, for example:
{
 	 'args': [u'sepal_width', u'petal*'],
	 'params': {u'fit_intercept': u't'},
	 'feature_variables': ['petal*'],
	 'target_variable': ['sepal_width']
	 'algo_name': u'LinearRegression',
         'mlspl_limits': { ... },
 }

This dictionary of options includes:

- args (list) - a list of the fields used
- params (dict) - any parameters (key-value) pairs in the search
- feature_variables (list) - fields to be used as features
- target_variable (list) - the target field for prediction
- algo_name (str) - the name of algorithm
- mlspl_limits (dict): mlspl.conf stanza properties that may be used in utility methods

Other keys that may exist depending on the search:

- model_name (str) - the name of the model being saved ('into' clause)
- output_name (str) - the name of the output ('as' clause)
df A pandas DataFrame of the input data from the search results. See

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html.

Attributes

Inside of the fit method, two attributes can be attached to self by the search command.

Attribute Description
self.feature_variables list - the wildcard matched list of fields present from the search
self.target_variable str - the name of the target field (only present if from-clause is used)

BaseAlgo class

A custom algorithm template for you to use is provided below.

from base import BaseAlgo


class CustomAlgoTemplate(BaseAlgo):
    def __init__(self, options):
        # Option checking & initializations here
        pass

    def fit(self, df, options):
        # Fit an estimator to df, a pandas DataFrame of the search results
        pass

    def partial_fit(self, df, options):
        # Incrementally fit a model
        pass

    def apply(self, df, options):
        # Apply a saved model
        # Modify df, a pandas DataFrame of the search results
        return df

    @staticmethod
    def register_codecs():
        # Add codecs to the codec manager
        pass

Using the template above in a search, as in the example below, reflects the input data back to the search.

| fit CustomAlgoTemplate *

These are all described in detail in the $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/base.py BaseAlgo class as shown below.

Pygment.png

Last modified on 03 July, 2019
Register an algorithm   Running process and method calling convention

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 3.2.0, 3.3.0, 3.4.0, 4.0.0, 4.1.0, 4.2.0, 4.3.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters