Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

About the Splunk App for Data Science and Deep Learning

The Splunk App for Data Science and Deep Learning is a free app you can download from Splunkbase.

The Splunk App for Data Science and Deep Learning (DSDL), formerly known as the Deep Learning Toolkit (DLTK), lets you integrate advanced custom machine learning and deep learning systems with the Splunk platform. The app extends the Splunk Machine Learning Toolkit (MLTK) with prebuilt Docker containers for TensorFlow, PyTorch, and a collection of data science, NLP, and classical machine learning libraries.

When you use the predefined workflows of Jupyter Lab Notebooks, the app enables you to build, test, and operationalize customized models with the Splunk platform. You can leverage GPUs for compute-intense training tasks and deploy models on CPU- or GPU-enabled containers.

The following image shows a high-level workflow you can follow when using DSDL.
This image shows a visual representation of a high level workflow using DSDL  The named stages of the workflow are frame the problem, get data and explore, create models, evaluate and understand, and present and operationalize.

The Splunk App for Data Science and Deep Learning is not a default solution, but a way to create custom machine learning models. It's helpful to have domain knowledge, Splunk Search Processing Language (SPL) knowledge, Splunk platform experience, Splunk Machine Learning Toolkit (MLTK) experience, and Python and Jupyter Lab Notebook experience when using DSDL.

Splunk App for Data Science and Deep Learning features

The following features are included with the Splunk App for Data Science and Deep Learning:

  • More than 30 examples that showcase different deep learning and machine learning algorithms for classification, regression, forecasting, clustering, graph analytics and NLP. These examples can inform how to tackle advanced data science use cases in the areas of IT operations, security, application development, IoT, and business analytics.
  • Rapid model development workflows leveraging Jupyter Lab Notebooks. You can address advanced modeling use cases that are not possible to address with MLTK.
  • Familiar SPL syntax from MLTK including the ML-SPL commands fit, apply, and summary.
  • Acts as an extension of MLTK functionality that lets you develop your own custom analytics with high computational workloads that rely on any Python open source library, and operationalize your custom defined models on the Splunk platform including dashboards and alerts.
  • Ability to connect your Splunk search head to container environments including Docker, Kubernetes, and OpenShift, each including optional GPU support.
  • Access to pre-built containers including Golden Image GPU for TensorFlow, PyTorch, and DASK.

Requirements for the Splunk App for Data Science and Deep Learning

In order to successfully run the Splunk App for Data Science and Deep Learning, the following is required:

Splunk App for Data Science and Deep Learning roles

The Splunk App for Data Science and Deep Learning ships with the following two roles:

Role name Description
mltk_container_user This role inherits from default Splunk platform user role and extends it with 2 capabilities:
  • Ability to list container models as visible on the container dashboard.
  • Ability to control containers to be started or stopped.
mltk_container_admin This role inherits capabilities from mltk_container_user and also has the following capability:
  • Ability to access the setup page and do configuration changes.

To learn more about managing Splunk platform users and roles, see Create and manage roles with Splunk Web in the Securing Splunk Enterprise manual.

Splunk App for Data Science and Deep Learning navigation

See the following table for an overview of the available menu tabs and their functions:

Menu tab name Description
Content Use the Content tab for an overview of DSDL contents. From this page you can easily navigate to container configuration pages, any of the 30+ app examples, the Neural Network Designer Assistant workflow, and to the app operations pages where you can view container statuses and app benchmarks.
Configuration Setup: This page provides confirmation that all prerequisites (MLTK and PSC) are installed, and lists the installed version. A diagram shows a simplified functional architecture of DSDL and how it interfaces with the different components. Use the Setup page to connect DSDL to a Docker, Kubernetes, or OpenShift based container environment. This setup step is required to begin using DSDL with your own data.
Containers: An overview of your active containers. Use this page to manage your development and production containers.
Container Image Builder: This page offers a preview of new DSDL functionality. Use the page to automatically build DSDL compatible container images.
Examples Choose from over 30 examples, organized by type including classification, regression, forecasting, clustering, natural language processing (NLP), graph analytics, and data mining. All of these examples link to a Splunk dashboard that shows a specific technique in action applied to some sample data. Every dashboard is related to a Jupyter notebook that defines how the technique is implemented.
Assistants Neural Network Designer: This DSDL Assistant offers a guided workflow to define a dataset with a Splunk search, train a binary neural network classifier, and evaluate the results.
Deep Learning Text Summarization: This DSDL Assistant offers a guided workflow to develop deep-learning-based text summarization solutions from your text data.
Deep Learning Text Classification: This DSDL Assistant offers a guided workflow to develop deep-learning-based text classification solutions.
Operations Operations Overview: Use this overview for a snapshot of your DSDL operations. Metrics shown can be opened as their own search or exported.
Container Status: Use these panels for an overview of your active containers, ML-SPL command statistics, and an error log.
Runtime Benchmarks: Use these informational benchmarks for a better understanding of the runtime behavior of DSDL for typical dataset sizes.
Other Dashboards: View dashboards included with DSDL or other Splunk platform products. You can also create new dashboards.
Datasets: Use this listings page to view and manage your existing datasets. Click a dataset name to view its contents.
Reports: Review and optionally edit reports included with DSDL or other Splunk platform products.
Documentation Review the DSDL documentation in a new browser tab.
Search Open a new Splunk search.

Key terms in the Splunk App for Data Science and Deep Learning

DSDL offers an open, plug-in architecture for any algorithm, runtime, or execution environment. See the following key terms to gain familiarity with the DSDL structure.

Key term Description
Model When you run an algorithm on a dataset you create a model. Typically models are created to detect patterns in your data.
Algorithm An algorithm operates like a program, mapping input data to output data. Use an algorithm to train a model or apply a pre-trained model on new data. In DSDL an algorithm always depends on a specific runtime.
Runtime The framework for an algorithm that typically uses a certain set of libraries. In DSDL, to execute an algorithm you must deploy it with a specific runtime into an environment.
Environment The infrastructure or service that executes the algorithm and serves the model.
Golden Image Available from the Container Image dropdown menu on the Containers page. The Golden Image for CPU and GPU contains most of the recent popular libraries including TensorFlow, PyTorch and various others. Other images prebuilt for specific libraries, such as Spark, River or Rapids are also available.

Splunk App for Data Science and Deep Learning restrictions

The DSDL model-building workflow includes processes that occur outside of the Splunk platform ecosystem, leveraging third-party infrastructure such as Docker, Kubernetes, OpenShift, and custom Python code defined in JupyterLab. Any third-party infrastructure processes are out of scope for Splunk platform support or troubleshooting.

See the following table for DSDL app limitations and restrictions:

App limitation or restriction Description
Docker, Kubernetes, and OpenShift environments The architecture only supports Docker, Kubernetes, and OpenShift as target container environments.
No indexer distribution Data is processed on the search head and sent to the container environment. Data cannot be processed in a distributed manner, such as streaming data in parallel from indexers to one or many containers. However, all advantages of search in a distributed Splunk platform deployment still exist.
Security protocols Data is sent from a search head to a container over HTTPS protocol. Splunk administrators must take steps to secure the setup of DSDL and container environment accordingly.
Atomar container model Models created using the Splunk App for Data Science and Deep Learning (DSDL) are atomar in that each model is served by one container.
Global model sharing Models must be shared if they need to be served from a dedicated container. Set the model permission to Global.
Naming convention Model names must not include white spaces for model configuration to work properly.
Last modified on 18 September, 2023
  Splunk App for Data Science and Deep Learning architecture overview

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.1.1, 5.1.2

Was this topic useful?

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters