Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Splunk App for Data Science and Deep Learning workflow

The Splunk App for Data Science and Deep Learning (DSDL) lets you to integrate advanced custom machine learning and deep learning systems with the Splunk platform. You can build, test, and operationalize customized models that leverage GPUs for compute intense training tasks.

After installing and configuring DSDL you can follow these high-level steps to use the app for your own business case:

  1. Launch a new development container
  2. Open your preferred third-party tool
  3. Build and iterate the model
  4. Monitor and manage DSDL

Launch a new development container

Take the following steps to launch a new development container in DSDL:

  1. From the Configuration tab, select Containers. Come back to this page at anytime to see the number and status of your containers, and to stop or start containers.

    Containers can take a few seconds to start and stop.

  2. Make a selection from the Container Image drop-down menu. There are several pre-built images for specific libraries, including Spark, River or Rapids.
  3. Choose the GPU runtime drop-down value. The GPU runtime menu is populated based on the chosen Container Image.
  4. Choose the Cluster target drop-down value.
  5. Select Start to create the development container.

Open your preferred third-party tool

Affer the container is running, you can open the third-party tool of your choice. Options include JupyterLab, TensorBoard, MLFlow, and Spark UI . Selecting the third-party tool opens a new browser tab.

This image shows the Containers page of DSDL. In this image, one Container has been successfully set up. You can now select from the available third-party tools to further explore your data, experiment, and make models. The on-screen options of JupyterLab, Tensorboard, MLflow, and SparkUI are highlighted.

Build and iterate the model

Use your preferred third-party tool to load your dataset, choose algorithms and parameters, and build, test, and iterate your machine learning or deep learning model. For more detailed steps, see Develop a model using JupyterLab.

Monitor and manage DSDL

DSDL includes the following pages from which you can monitor and manage your containers and deployment of the app:

DSDL page Description
Configuration > Containers An overview of your development and production containers. The dashboard refreshes every 5 seconds. Choose to stop or start any of your containers from this page.
Operations > Operations Overview A visual overview of DSDL app operations including your total container images.
Operations > Container Status An in-depth set of dashboard panels including container activity logs, fit command, apply command, and summary command duration statistics, and an error counter.

You need access to the _internal index to see information on the Container Status page.

Operations > Runtime Benchmarks Informative dashboards on the runtime behavior of DSDL for different dataset sizes. The available benchmarks only profile single-instance DSDL deployments that do not utilize any parallelization or distribution strategies. Use the benchmarks as a baseline for algorithms operating on small to medium sized datasets.

Algorithms and dataset size can result in very different runtime behavior, and is worth investigating on a case-by-case basis.

Example use case with JupyterLab

To gain familiarity on how to use the Splunk App for Data Science and Deep Learning, you can explore the Notebook examples and how the related pre-built model can be viewed in the Splunk platform.

Perform the following steps to explore one of the Jupyter examples:

  1. From the Configurations > Containers page, select the JupyterLab button.
  2. The JupyterLab interface opens in a new tab. Login to JupyterLab with the default password of Splunk4DeepLearning.
  3. From the notebooks menu, select the drift_detection.ipynb Notebook. This image shows the list view in JupyterLab of all the pre-built Notebooks provided in the DSDL app. The Notebook named drift.detection.ipynb is highlighted.
  4. In this example Notebook or any of the other pre-built Notebooks, you can interactively run the code cells and create additional cells for any additional code you want to test or develop. This can also include visualizations or any other functionality available in Python libraries or JupyterLab.
  5. Navigate back to the tab for your Splunk platform instance and select Examples > Data Mining > Example for Bayesian Online Change Point Detection.
  6. Select Submit.
    This image shows the Splunk platform interface. The Example for Bayesian Online Change Point Detection has been selected. At the top of the resulting screen, the button labeled Submit is highlighted.
  7. On this dashboard you can see how the Drift Detection algorithm is working on the sample data. On the Raw Data panel you can see some sample data of ping events with the numeric field rtt extracted. This field contains measures of ping round-trip times. The Drift Detection algorithm is applied to this target variable.
  8. From the Example for detecting drift in network round trip time measurements panel, select the Open in Search icon. This opens a new tab where you can view the underlying SPL.
    This image shows the top panel on this DSDL Example. A magnifying glass icon is highlighted. Clicking this icon opens the data in a new Splunk search.
  9. The SPL shows how the Drift Detection algorithm is integrated into the search pipeline. The first six lines perform all the necessary data preprocessing. The | fit MLTKContainer statement passes the dataframe over to the container to run the algo=drift_detection with the given parameters. The additional column drift gets added to the search results and contains the detected drifts with a simple binary 0 or 1 mark.
    This image shows the SPL code for the Example for Bayesian Online Change Point Detection.
Last modified on 11 December, 2023
PREVIOUS
Leverage provided examples of the Splunk App for Data Science and Deep Learning
  NEXT
Develop a model using JupyterLab

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.0.0, 5.1.0, 5.1.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters