Splunk® Machine Learning Toolkit

ML-SPL API Guide

This documentation does not apply to the most recent version of Splunk® Machine Learning Toolkit. For documentation on the most recent version, go to the latest release.

Support Vector Regressor

This example covers the following tasks:

  • using the BaseAlgo and a mixin class
  • converting parameters
  • using the register_codecs method


In this example, you will add scikit-learn's Support Vector Regressor algorithm to the Splunk Machine Learning Toolkit. See the scikit-learn documentation for details on the Support Vector Regressor class.

The custom algorithm inherits from two classes: BaseAlgo and RegressorMixin. The mixin has already filled out the fit and apply methods for us, you only need to define the __init__ and register_codecs methods.

This example uses the ML-SPL API available in the Splunk Machine Learning Toolkit version 2.2.0 and later. Verify your Splunk Machine Learning Toolkit version before using this example.

Steps

Do the following:

  1. Register the algorithm in __init__.py.
    Modify the __init__.py file located in $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos to register your algorithm by adding it to the list:
    __all__ = [
        "SVR",
        "LinearRegression",
        "Lasso",
        ...
        ]
    
  2. Create the python file in the algos folder. For this example, create $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos/SVR.py.
    from sklearn.svm import SVR as _SVR
     
    from base import BaseAlgo, RegressorMixin
    from util.param_util import convert_params
    
  3. Define the class.
    Inherit from both the RegressorMixin and BaseAlgo.

    When inheriting from multiple classes here make sure the RegressorMixin comes first. BaseAlgo will raise errors if a method is not implemented. In this case,the methods are defined in RegressorMixin so you must list that class first.

    class SVR(RegressorMixin, BaseAlgo):
    	"""Predict numeric target variables via scikit-learn's SVR algorithm."""
    
  4. Define the __init__ method.
    • Use the RegressorMixin’s handle_options method to check for feature & target variables.
    • The RegressorMixin also implicitly expects the attribute estimator to be set on self.
        def __init__(self, options):
            self.handle_options(options)
    
            params = options.get('params', {})
            out_params = convert_params(
                params,
                floats=['C', 'gamma'],
                strs=['kernel'],
                ints=['degree'],
            )
    
            self.estimator = _SVR(**out_params)
    
  5. Define the register_codecs method.
    • To apply the model to new data, you must save the model.
    • RegressorMixin has already defined the fit & apply methods - to save a model, you must define the register_codecs method
    • Add two things to serialize:
      • the custom algorithm class
      • the imported SVR module
        @staticmethod
        def register_codecs():
            from codec.codecs import SimpleObjectCodec
            from codec import codecs_manager
            codecs_manager.add_codec('algos.SVR', 'SVR', SimpleObjectCodec)
            codecs_manager.add_codec('sklearn.svm.classes', 'SVR', SimpleObjectCodec)
    

    Most often, you will not need to use anything other than the SimpleObjectCodec. If there are circular references or unusual properties to the algorithm, you may need to write your own. A codec defines how to serialize (save) and deserialize (load) python objects into and from strings. Here is an example of a custom codec needed for a subcomponent in the DecisionTreeClassifier algorithm.

    from codec.codecs import BaseCodec
    
    
    class TreeCodec(BaseCodec):
        @classmethod
        def encode(cls, obj):
            import sklearn.tree
            assert type(obj) == sklearn.tree._tree.Tree
    
            init_args = obj.__reduce__()[1]
            state = obj.__getstate__()
    
            return {
                '__mlspl_type': [type(obj).__module__, type(obj).__name__],
                'init_args': init_args,
                'state': state
            }
    
        @classmethod
        def decode(cls, obj):
            import sklearn.tree
    
            init_args = obj['init_args']
            state = obj['state']
    
            t = sklearn.tree._tree.Tree(*init_args)
            t.__setstate__(state)
    
            return t
    

    In DecisionTreeClassifier.py, the register_codecs method is:

        @staticmethod
        def register_codecs():
            from codec.codecs import SimpleObjectCodec, TreeCodec
            codecs_manager.add_codec('algos.DecisionTreeClassifier', 'DecisionTreeClassifier', SimpleObjectCodec)
            codecs_manager.add_codec('sklearn.tree.tree', 'DecisionTreeClassifier', SimpleObjectCodec)
            codecs_manager.add_codec('sklearn.tree._tree', 'Tree', TreeCodec)
    

    Finished example

    from sklearn.svm import SVR as _SVR
    
    from base import BaseAlgo, RegressorMixin
    from util.param_util import convert_params
    
    
    class SVR(RegressorMixin, BaseAlgo):
    
        def __init__(self, options):
            self.handle_options(options)
    
            params = options.get('params', {})
            out_params = convert_params(
                params,
                floats=['C', 'gamma'],
                strs=['kernel'],
                ints=['degree'],
            )
    
            self.estimator = _SVR(**out_params)
    
        @staticmethod
        def register_codecs():
            from codec.codecs import SimpleObjectCodec
            from codec import codecs_manager
            codecs_manager.add_codec('algos.SVR', 'SVR', SimpleObjectCodec)
            codecs_manager.add_codec('sklearn.svm.classes', 'SVR', SimpleObjectCodec)
    
Last modified on 06 June, 2017
Agglomerative Clustering   Savitzky-Golay Filter

This documentation applies to the following versions of Splunk® Machine Learning Toolkit: 2.2.0, 2.2.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters