Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

Splunk App for Data Science and Deep Learning commands

The Splunk App for Data Science and Deep Learning (DSDL) integrates advanced custom machine learning and deep learning systems with the Splunk platform. The app extends the Splunk platform with prebuilt Docker containers for TensorFlow and PyTorch, and additional data science, NLP, and classical machine learning libraries.

Familiar SPL syntax from MLTK, including the ML-SPL commands fit, apply, and summary, is used to train and operationalize models in a container environment such as Docker, Kubernetes, or OpenShift.

Any processes that occur outside of the Splunk platform ecosystem that leverage third-party infrastructure such as Docker, Kubernetes, or OpenShift, are out of scope for Splunk platform support or troubleshooting.

Using the fit command

You can use the fit command to train a machine learning model on data within the Splunk platform. This command sends data and parameters to a container environment and saves the resulting model with the specified name.

When you run the fit command without additional parameters, training is performed based on the code in the associated notebook. When you include mode=stage, data is transferred to the container for development in JupyterLab without running full training.

Syntax

| fit MLTKContainer algo=<algorithm> mode=stage <feature_list> into app:<model_name>

Parameters

Parameter Description
algo Specifies the notebook name or algorithm in the container environment.
mode When set to stage, the command sends data to the container but does not perform training.
features_<feature_list> Defines the feature fields to include in training.
into app:<model_name> Saves the model or staged data in DSDL under the specified name.

Examples

The following example stages data without training so you can work iteratively in JupyterLab:

| fit MLTKContainer mode=stage algo=my_notebook into app:barebone_template

The following example sends additional data, including the _time field and feature*, to the barebone_template notebook for iterative development in JupyterLab:

| fit MLTKContainer mode=stage algo=barebone_template _time feature_* i into app:barebone_template

Using the apply command

Use the apply command to generate model predictions on new data within Splunk. This step can be automated through scheduled searches or integrated into dashboards and alerts for real-time monitoring.

Syntax

| apply <model_name>

Parameters

Parameter Description
model_name The name of the model to apply. This corresponds to the name used in the into app: clause when the model was trained.

Example

The following example runs inference on a dataset using the model named barebone_template. You can follow this with the score command to evaluate model performance, such as accuracy metrics for classification or R-squared for regression:

| apply barebone_template

Using the summary command

Use the summary command to retrieve model metadata and configuration details, including hyperparameters and feature lists. This command helps track or inspect the exact parameters used in training.

Syntax

| summary <model_name>

Parameters

Parameter Description
model_name The name of the model for which to retrieve metadata.

Examples

The following example retrieves metadata for app:barebone_template, including training configurations and feature names.

| summary app:barebone_template

Using commands in a workflow

The following is a workflow of how you can use the different ML-SPL commands available in DSDL:

This graphic represents the workflow described in the subsequent sub-sections including data exploration, data staging and preparation, model training, model inference, model evaluation, and summary and monitoring.

Data exploration

You can identify and refine your data in Splunk using SPL.
If you are using the Splunk access token, you can pull data interactively with SplunkSearch.SplunkSearch().
For data exploration with JupyterLab you can push a sample using | fit MLTKContainer mode=stage ... .

Data staging and preparation

You can use | fit MLTKContainer mode=stage ... to transfer data and metadata to the container.
Clean and configure the staged dataset within JupyterLab.

Model training

Model names must not include white spaces.

You can use | fit MLTKContainer algo=<algorithm> ... on your final training data.
Leverage GPUs if needed. Monitor progress in JupyterLab.

Model inference

You can use | apply <model_name> to generate predictions.
Integrate with Splunk dashboards, alerts, or scheduled searches for operational use.

Model evaluation

You can use | score for classification or regression metrics.
Return to JupyterLab to refine or retrain the model if needed.

Summary and monitoring

You can use | summary <model_name> to review metadata and configurations.
Set model permissions to Global if it needs to be served from a dedicated container.
Monitor performance in Splunk dashboards or alerts and detect potential drift.

Pull data using Splunk REST API

You can interactively search Splunk from JupyterLab, provided your container can connect to the Splunk REST API and you have configured a valid Splunk auth token in the DSDL setup page. This is useful for quickly testing different SPL queries, exploring data in a Pandas DataFrame, or refining your search logic without leaving Jupyter.

Complete the following steps:

This approach returns raw search results only. No metadata or parameter JSON is generated. If you need structured metadata, use the fit command with mode=stage to push data to the container.

  1. Generate a Splunk token:
    1. In the Splunk platform, go to Settings and then Tokens to create a new token.
    2. Copy the generated token for use in DSDL.
  2. (Optional) Set up Splunk access in Jupyter:
    1. In DSDL, go to Configuration and then Setup.
    2. Locate Splunk Access Settings and enter your Splunk host and the generated token. This makes the Splunk REST API available within your container environment.

      The default management port is 8089.

  3. Now that Splunk access is configured, you can pull data interactively in JupyterLab:
    from dsdlsupport import SplunkSearch
    
    # Option A: Open an interactive search box
    search = SplunkSearch.SplunkSearch()
    
    # Option B: Use a predefined query
    search = SplunkSearch.SplunkSearch(search='| makeresults count=10 \n'
    '| streamstats c as i \n'
    '| eval s = i%3 \n'
    '| eval feature_{s}=0 \n'
    '| foreach feature_* [eval <<FIELD>>=random()/pow(2,31)] \n'
    '| fit MLTKContainer mode=stage algo=barebone_template _time feature_* i into app:barebone_template')
    
    # Option C: Use a referenced query
    example_query = '| makeresults count=10 \n'
    '| streamstats c as i \n'
    '| eval s = i%3 \n'
    '| eval feature_{s}=0 \n'
    '| foreach feature_* [eval <<FIELD>>=random()/pow(2,31)] \n'
    '| fit MLTKContainer mode=stage algo=barebone_template _time feature_* i into app:barebone_template')
    
    search = SplunkSearch.SplunkSearch(search=example_query)
    
    # Run the search and then retrieve the results
    df = search.as_df()
    df.head()
    
    

Push data using the fit command

You can send data from the Splunk platform to the container using the following command:

fit MLTKContainer mode=stage

This writes both the dataset and relevant metadata, such as feature lists and parameters, to the container environment as CSV and JSON files. This approach is well suited for building or modifying a notebook in JupyterLab, while referencing a known dataset structure and configuration.

Example

The following example uses the fit command to send data from the Splunk platform to a container:

| fit MLTKContainer mode=stage algo=barebone_template _time feature_* i into app:barebone_template

# Retrieve data in notebook
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

df, param = stage("barebone_template")
Last modified on 24 January, 2025
Using multi-GPU computing for heavily parallelled processing   Using the Neural Network Designer Assistant

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters