Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

Model governance and security in the Splunk App for Data Science and Deep Learning

The Splunk App for Data Science and Deep Learning (DSDL) lets you to train advanced machine learning models in containerized environments. However, enterprise-grade machine learning might require model governance, secure container management, and strict access controls to ensure that data, models, and container images meet compliance and operational standards.

Learn how to handle model governance and security in your DSDL environment and how DSDL automatically persists notebooks and models to protect against ephemeral container storage losses.

Overview

DSDL extends the Splunk Machine Learning Toolkit (MLTK) with container-based execution. While MLTK offers access controls for model artifacts, DSDL amplifies this with external container images and Jupyter-based notebooks.

DSDL supports the following model governance features:

  • Model permissions in Splunk including global, app-level, and user-level.
  • Automatic sync of notebooks and models onto the Splunk instance, preventing data loss in ephemeral container volumes or NFS shares.
  • Container image security including private registries, image scanning, restricted GPU usage, plus custom TLS certificates.
  • Data encryption and TLS between Splunk and container traffic.
  • Model lifecycle including versioning, auditing, re-deploying, or rolling back models.

Model lifecycle and sharing

Review the following for steps in the lifecycle of a DSDL model.

Model creation and training

Run the following to create and train a new model:

 | fit MLTKContainer algo=my_notebook ... into app:MyModel

DSDL spins up a container, executes training, and saves model artifacts under app:MyModel.

The model is stored in the container environment during training, but references appear in the Splunk platform.

Model permissions

The following permissions are available with your models:

Permissions Description
App context By default, model names such as app:MyModel are recognized by DSDL.
Sharing Splunk knowledge object sharing can be set to User, App, or Global.
User Visible only to the model creator.
App Shared by users of the same Splunk app.
Global Visible across the Splunk platform and suitable for widely used HPC or production models.

Model retraining or versioning

Re-run the following with new data or parameters. This overwrites old artifacts:

| fit MLTKContainer algo=my_notebook ... into app:MyModel

For a separate version , for example MyModel_v2, specify a new name in the into app: clause.

Keep ML-SPL plus .ipynb code in Git. If the new version is suboptimal, you can revert easily.

Automatic notebook to model sync

DSDL automatically persists your notebooks and model files onto the Splunk platform instance. This prevents data loss if ephemeral or NFS volumes go offline and lets new containers retrieve the same notebooks and models.

Automation relies on internal sync scripts such asSyncHandler that kill orphaned containers and reconcile model stanzas.

Container image security

Review the following options to secure your container images.

Private registry and air-gapped images

You can use a private Docker registry or an air-gapped approach. Push images from golden-cpu, golden-gpu, or custom to your internal registry. In DSDL go to Setup and then Container Settings, and specify that private registry URL so DSDL pulls from it.

Use docker save/load or bulk_build.sh if your environment has no internet. Keep a separate Git or artifact repo with Dockerfiles and pinned requirements.

Image scanning and hardening

Use scripts from [splunk-mltk-container-docker](#) or tools such as Trivy to detect known common vulnerabilities and exposures (CVE). You can remove unneeded packages for minimal images. Patch OS-level vulnerabilities regularly (Debian, Red Hat UBI, etc.).

GPU resource restrictions

In Kubernetes or OpenShift, define resource requests so only authorized machine learning tasks can claim GPUs. In single-host Docker, pass --gpus or runtime=nvidia to control GPU usage.

Embedding custom certificates for production HTTPS

In production, you will need trusted HTTPS on container endpoints. DSDL images can include your own TLS certificates instead of the default, self-signed certificates. The splunk-mltk-container-docker repo includes a certificates folder demonstrating how to embed custom certificates.

For development environments, a self-signed certificate can suffice. For production, you might want your organization's CA-signed certificate.

Follow these steps:

  1. Clone the repo:
    git clone https://github.com/splunk/splunk-mltk-container-docker
  2. Place your certificates in the certificates directory, named dltk.key (private key) and dltk.pem (certificate).
  3. (Optional) Generate self-signed certs for testing:
    openssl req -x509 -nodes -days 3650 -newkey rsa:2048 \
        -keyout dltk.key -out dltk.pem \
        -subj "/CN=bobobobobbo"
    
    Or create your own CA-signed certs with the same filenames and dimensions.
  4. Build your container image using scripts:
    ./build.sh golden-cpu-custom splunk/ 5.2.0
    
    The Docker build automatically copies dltk.key and dltk.pem into /dltk/.jupyter/. This sets up the container to serve HTTPS endpoints with your certificate.

Make sure the file names remain dltk.key and dltk.pem or adapt the Dockerfile references so the container recognizes them. Only these exact filenames are used at runtime.

Roles, capabilities, and container access

Review the following for information on roles and permissions in DSDL.

DSDL roles and capabilities

DSDL offers the following container-related capabilities:

  • configure_mltk_container: Manage container settings (Observability tokens, cert configs).
  • list_mltk_container: List containers on the container dashboard.
  • control_mltk_container: Start/stop containers from DSDL UI.

You can consider limiting "configure_mltk_container" capabilities for Splunk admins, "control_mltk_container" for data-science roles, and "list_mltk_container" for general usage.

Model permissions

By default, only the model creator sees the model. For HPC or large production usage, set model sharing to "Global."

Secure HEC, Observability, and container endpoints

Use Splunk HEC tokens carefully if you log partial training data. If Observability is enabled, guard your Observability Access Token. If you want production-level TLS in the container, use embedding custom certificates.

Auditing and traceability

Review the following options for model auditing and traceability.

Use _internal logs for model creation

Use to help track who trained which model and when. When you run | fit ... into app:MyModel, logs appear in _internal, referencing information including container staging. Example:

index=_internal "mltk-container" "into=app:MyModel"

Use model summary and metadata

Running | summary MyModel returns model information such as hyperparams and creation time. You can build a "model catalog" or store these events in a dedicated Splunk index for extended auditing.

Use notebook versioning in Git

DSDL automatically syncs notebooks to the Splunk platform , but you can also store .ipynb files in Git for collaboration and rollback.

TLS and data encryption

Review the following table for information on Transport Layer Security (TLS) and data encryption in model governance and security:

Option Description
TLS from Splunk to container Dev containers might use self-signed certs. Production containers must have properly signed certificates.

If using Docker single-host, the container endpoints themselves handle TLS. For Kubernetes, often an Ingress handles TLS termination.

GPU data in transit Data from the Splunk platform is still subject to TLS encryption, even if the container uses GPUs.

The ephemeral GPU usage does not affect encryption but does matter for ephemeral volumes, mitigated by the sync to the Splunk platform.

Automatic notebook and model sync

Containers are ephemeral by default. If ephemeral volumes or NFS shares go down, you risk losing code or trained models. The DSDL internal "sync" scripts store notebooks and models on the Splunk instance. If containers vanish or fail, you can re-launch them and retrieve the same notebooks/models.

The "SyncHandler", plus related scripts, kill orphaned containers, reconcile stanzas with actual containers, and ensure ephemeral data is re-synced. This preserves your environment from data loss, letting you focus on the machine learning workflow, rather than container lifecycle details.

Governance and security guidelines

Review the following guidelines for model governance and security:

  • Least privilege: Restrict advanced container management capabilities to admin or power users.
    Use minimal images, adding only the libraries you need.
  • Notebook plus model sync plus Git: Rely on DSDL's automatic sync to avoid ephemeral data loss, but store .ipynb in Git for version control.
  • Scan container images: Use Trivy or the built-in scripts from splunk-mltk-container-docker.
  • Custom certificates: For production HTTPS in containers, place dltk.key and dltk.pem in certificates/. Use openssl req -x509 for a quick way to generate a self-signed pair for development.
    • For real certs, rename or place them with the same file names so your container's Dockerfile picks them up.
  • Observability: If Observability is toggled on in DSDL, container endpoints are auto-instrumented with Otel. Confirm your endpoint, token, and service name.

Troubleshooting model governance and security

See the following table for issues you might experience and how to resolve them:

Issue Cause How to investigate
"model not found: MyModel" Model is private or in a different app context. Adjust sharing or confirm container logs. Possibly search _internal for "mltk-container" references to your model.
HPC node can't pull image Private registry or TLS error. Re-check your Docker/ or Kubernetes credentials, or your images.conf references to the registry.
Observability instrumentation not active on endpoints Observability toggled off or invalid token in DSDL under Setup, and then Observability Settings. Revisit the Setup page. The container might need a restart with new config.
Notebooks vanish after container restarts Ephemeral volume wiped or NFS gone. Automatic Splunk-side sync should restore them. Check _internal "mltk-container" for any sync errors.
"Invalid certificate" on container endpoint Using self-signed or misnamed cert, or the container lacking your official CA. Place your real cert in certificates/dltk.pem + dltk.key and rebuild container. Review Docker logs for TLS load errors.
Last modified on 28 July, 2025
Advanced container customization   Using the Neural Network Designer Assistant

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.1


Please expect delayed responses to documentation feedback while the team migrates content to a new system. We value your input and thank you for your patience as we work to provide you with an improved content experience!

Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters