Install and configure the Splunk App for Data Science and Deep Learning in an air-gapped environment
For security or compliance reasons your organization might restrict outbound internet connections and operate an air-gapped environment. The Splunk App for Data Science and Deep Learning (DSDL) can still run in these environments by manually managing container images, having local Python dependencies, and making Splunk configuration steps offline.
In a typical DSDL set up the following components are required:
- Splunk Enterprise or Splunk Cloud with the Machine Learning Toolkit (MLTK) app and Python for Scientific Computing (PSC) and add-on.
- A container environment of Docker, Kubernetes, or OpenShift to run external code.
- Internet access to pull official DSDL images from Docker Hub. For example
splunk-mltk-container-golden-image-cpu
.
In an air-gapped environment you must complete the following in order to successfully set up and run DSDL:
- Manually transfer container images to your private registry or local Docker host. See Offline container image management.
- Manually install MLTK, PSC, and other dependencies on Splunk. See Offline installation of MLTK, PSC, and DSDL.
- Access example notebooks offline. See Offline access of example notebooks.
- Manage sending traces to Splunk Observability Cloud. See Observability and telemetry in air-gapped environment.
- Manage models offline. See Offline model management.
See also:
DSDL in an air-gapped environment guidelines
Review these guidelines if you plan to use DSDL in an air-gapped environment:
Guideline | Description |
---|---|
Document the transfer workflow | Keep a written record of the process for how images are built, scanned, saved, and loaded in the offline environment. |
Use Git locally | If you want versioning for your notebooks, store them in an on-premises Git server, or use secure copy protocol (SCP) based workflows. See https://git-scm.com/. |
Minimize image bloat | Large images can be tedious to move offline. Only install the libraries you truly need. |
Test in a staging environment | Validate each new container version or MLTK code change in a staging area before transferring to production. |
Turn off features that rely on external network calls | You might want to remove or turn off features that rely on external network calls such as downloading notebook examples from GitHub, or certain tokens that require calling out to Splunk Observability tools. |
Offline container image management
See the following sections for steps to manage offline containers:
- Build or pull images on a connected host
- Transfer to the air-gapped environment
- Update available images in Splunk
Build or pull images on a connected host
Complete these steps to build or pull images on a connected host:
- On a machine with internet choose 1 of these options:
- Pull the official images from Docker Hub. For example
phdrieger/mltk-container-golden-image-cpu:5.1.0
. - Use scripts from
[splunk-mltk-container-docker](#)
to build custom images. For examplebuild.sh, bulk_build.sh
.
- Pull the official images from Docker Hub. For example
- If your security policy requires, scan the images for vulnerabilities. For example if using Trivy use
using scan_container.sh
. - Save the images to .TAR files.
Example:docker save phdrieger/mltk-container-golden-image-cpu:5.1.0 -o golden_cpu_5.1.0.tar
Example if using your own custom images:
docker save myregistry.local/golden-cpu-custom:5.2.0 -o golden_cpu_custom_5.2.0.tar
Transfer to the air-gapped environment
Complete these steps to transfer images to the air-gapped environment:
- Copy the .TAR files into the offline environment with a USB drive, secure network copy, or similarly secure option.
- Load the images into your local Docker or private registry as shown in the following example:
docker load -i golden_cpu_custom_5.2.0.tar
- (Optional) If you want multiple hosts to pull the images, tag and push the images to an internal registry.
Example:docker tag golden-cpu-custom:5.2.0 registry.local/golden-cpu-custom:5.2.0 docker push registry.local/golden-cpu-custom:5.2.0
Update available images in Splunk
In DSDL, you must tell Splunk which images are available.
Complete these steps to update the available images in Splunk:
- If you built images from
[splunk-mltk-container-docker](#
), generate an images.conf snippet using thebuild.sh
scripts . This is typically placed in the $SPLUNK_HOME/etc/apps/mltk-container/local/images.conf file. - Edit it manually, referencing your new offline image tags.
Example:[my_custom_image] repo = registry.local/ image = golden-cpu-custom:5.2.0 runtime = none short_name = Golden CPU Custom
Confirm that your container environment references the same local registry or Docker tags so DSDL can pull them.
Offline installation of MLTK, PSC, and DSDL
See the following table for how to complete an offline installation of required components:
Component | Description |
---|---|
MLTK and PSC | Download the Splunk Machine Learning Toolkit (MLTK) app and Python for Scientific Computing (PSC) add-on SPL packages from Splunkbase on a connected machine.
Then transfer the packages to your offline Splunk instance using removable media or secure file copy. Install the packages in Splunk:
|
DSDL | Download the Splunk App for Data Science and Deep Learning (DSDL) SPL package from Splunkbase or from your custom fork.
Then transfer the package to your offline Splunk instance using removable media or secure file copy. Install the package in Splunk:
|
Additional Python dependencies | If you have custom libraries in your Jupyter notebooks you must ensure they're in the container image. If you also rely on PSC for local Splunk usage, confirm you've installed the correct PSC version offline. |
Offline access of example notebooks
See the following table for how to access example notebooks offline:
Type of notebook | How to access offline |
---|---|
Built-in example notebooks | DSDL ships with example notebooks under $SPLUNK_HOME/etc/apps/<DSDL_app>/notebooks/ . You can use the notebooks offline if they're included in the default distribution.
|
Additional examples from GitHub | If you want extra examples from [splunk-mltk-container-docker/notebooks](#) or third-party sources, you must manually download them from a connected machine, then place them in your offline environment. DSDL's default scripts will attempt to fetch updates from GitHub. Disable or remove those steps if you're fully offline. |
Observability and telemetry in an air-gapped environment
If you're offline the container cannot automatically send Open Telemetry (Otel) traces to Splunk Observability Cloud. You have the following options:
- Skip Observability usage entirely in air-gapped mode.
- Use a custom Observability Gateway on your intranet if you have a specialized network architecture.
You can still collect container logs in _internal
or using local Docker drivers. HPC or dev logs can be manually forwarded to Splunk if you have local connections.
Offline model management
See the following table for how to manage models offline:
Model function | How to manage offline |
---|---|
Automatic notebook and model sync | DSDL continues to store notebooks and models on the Splunk instance, unaffected by being offline. If ephemeral containers vanish, you can relaunch them and retrieve the same code. |
Versioning | Even offline, you can keep your .IPYNB code in a local Git server or a secure copy protocol (SCP) workflow. This ensures you can track and revert changes. |
Container security | With no external net, scanning might be done on a staging machine. Ensure you replicate the same images in production. Trivy or other scanners can run offline if you maintain local Common Vulnerabilities and Exposures (CVE) databases. |
Example offline workflow
The following is an example workflow when using DSDL with Docker in an air-gapped environment:
- Prepare: On a connected host, build or pull your desired images.
- Scan: Use
scan_container.sh
or a third-party product such as Trivy. See https://trivy.dev/latest/. - Save: Use
docker save <your_image> -o <your_file>.tar
. - Transfer: Copy .TAR files to the offline environment. Use a USB or other secure copy protocol (SCP).
- Load: Use
docker load -i <your_file>.tar
. - Configure: Update the $SPLUNK_HOME/etc/apps/mltk-container/local/images.conf file or update using DSDL container setup, referencing your local Docker tags.
- Install: MLTK, PSC, and DSDL .SPL packages offline in Splunk.
- Use: Run
| fit MLTKContainer algo=...
referencing your offline container image. Notebooks and models remain in Splunk's local storage for ephemeral container cycles.
Troubleshoot DSDL in an air-gapped environment
Issue | Likely cause | Where to check |
---|---|---|
Cannot pull image: not found
|
Image not loaded or incorrectly tagged in the offline Docker registry. | Check Docker images. Confirm images.conf references the correct repo or tag. |
Splunk Observability not working
|
Container endpoints can't reach Splunk Observability Cloud. | Observability is typically unavailable in a fully offline context unless a custom local gateway is set up. |
"No module named X" in your Jupyter code
|
Custom Python library not built into the container image. | Rebuild or add library to requirements_files/ before docker build or build.sh .
|
HPC nodes can't see your local registry
|
No local registry credentials or misconfigured insecure-registry flags. | Check Docker daemon config on HPC nodes, or add a CA cert if your local registry uses TLS with a custom CA. |
Missing examples or notebooks
|
The default DSDL examples are present, but advanced examples from GitHub are not included in your offline environment. | Manually copy the advanced examples from GitHub from an internet host and place them in $SPLUNK_HOME/etc/apps/dsdlt-app/notebooks/ , or container volume.
|
Configure the Splunk App for Data Science and Deep Learning | Configure OpenShift integration for the Splunk App for Data Science and Deep Learning |
This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.0
Feedback submitted, thanks!