Configure Kubernetes integration for the Splunk App for Data Science and Deep Learning
Integrate the Splunk App for Data Science and Deep Learning (DSDL) with a Kubernetes environment to run data science workloads in a scalable and secure manner. Kubernetes provides container orchestration to manage and deploy containerized applications across a cluster of machines. This integration is suitable for production environments where performance, reliability, and security are critical.
For Kubernetes documentation see https://kubernetes.io/docs/home/.
Prerequisites
The following prerequisites must be met to configure a Kubernetes integration for DSDL:
- Splunk Enterprise installed and running.
- Splunk Machine Learning Toolkit (MLTK) and Python for Scientific Computing (PSC) installed on the Splunk Enterprise instance.
- DSDL installed on the Splunk Enterprise instance.
- Access to a Kubernetes cluster with appropriate permissions.
- The Kubernetes command-line tool (
kubectl
) configured to interact with your Kubernetes cluster. - Network connectivity between the Splunk Enterprise instance and the Kubernetes cluster.
Kubernetes configuration guidelines
Consider the following guidelines if you are configuring Kubernetes integration for DSDL:
Guideline | Description |
---|---|
Secure authentication | Use secure authentication for certificates or bearer tokens with limited RBAC privileges. |
Transport layer security (TLS) | Ensure the Kubernetes API server and set any external DSDL endpoints to use SSL. |
Permissions | Assign minimal permissions to manage pods and resources. |
Monitor and scale | Use Splunk Observability or cluster metrics to watch resource usage and scale as needed. |
Component updates | Keep Kubernetes, DSDL, and related components updated to their latest and compatible versions to benefit from security fixes and performance improvements. |
Set up a Kubernetes cluster
Before integrating with DSDL, set up a Kubernetes cluster that meets the following requirements:
Requirement | Details |
---|---|
Kubernetes version | Version 1.16 or higher is needed for compatibility with DSDL. |
Networking | Provide connectivity between Splunk Enterprise and the cluster. Configure network plugins such as Calico and Flannel as needed. |
Load balancer or Ingress controller | Expose services externally for production use if required. |
Persistent storage | Configure dynamic PVC provisioning if you plan to store model artifacts or data externally. |
Role-based access control (RBAC) | DSDL does not automate RBAC creation. You can manually assign RBAC details in the DSDL setup page. |
Configuration steps
Complete the following steps to configure a Kubernetes cluster:
- Install Kubernetes cluster:
- Use
kubeadm
or a managed provider such as Amazon EKS, Red Hat OpenShift, GKE, or AKS to install the cluster. - Ensure that all nodes communicate with each other and the control plane.
- Use
- Configure network plugin:
- Choose a plugin that matches your cluster's version.
- Install using directions from the plugin's documentation.
- Set up persistent storage:
- Install a storage provisioner such as NFS, Ceph, or AWS EBS.
- Create a StorageClass for dynamic provisioning.
- Test by creating and binding a sample PersistentVolumeClaim (PVC).
- Install ingress controller or load balancer:
- Use an ingress controller such as NGINX Ingress Controller or configure a load balancer. For example AWS Elastic Load Balancing (ELB).
- Enable SSL/TLS termination if you need secure external access.
- Verify cluster functionality:
- Deploy a simple test application.
- Confirm that services and the ingress controller or load balancer configurations work as expected.
Configure DSDL for Kubernetes
Once the Kubernetes cluster is set up, you can configure DSDL in Splunk Enterprise to deploy and manage your containerized data science workloads:
- In DSDL, go to Configuration and then Setup.
- Select Kubernetes and enter your cluster details.
- Choose the Service type.
- Provide a hostname. For example
dsdl.apps.<cluster-domain>
if you want a custom route. - Test and save: DSDL will attempt to deploy containers in your Kubernetes project.
Automatic deployment
After saving the necessary details on the DSDL setup page in Splunk Enterprise, DSDL automatically triggers the deployment of its containers and any required Kubernetes resources. This includes creating pods, persistent volumes, and services according to your configuration.
Authentication modes
DSDL supports multiple authentication methods for connecting to your Kubernetes cluster. Choose the mode that best suits your environment and security requirements:
Authentication mode | Details | When to use |
---|---|---|
Certificate and Key |
Obtain certificates from a trusted CA rather than self-signing certificates. |
Use for high-security environments where you have properly signed certificates. Use if you prefer mutual TLS over other mechanisms. |
User Token | Use a bearer token associated with a Kubernetes service account:
|
Use when you want a simple setup without managing certificates. Service accounts can have minimal or limited permissions through RBAC. |
User Login | Use a username and password for basic authentication:
|
Use for simple testing or development. Not suitable for production due to weaker security. |
Service Account (In-Cluster) | Use a service account automatically when Splunk Enterprise runs inside the same Kubernetes cluster:
|
Use if Splunk Enterprise is itself deployed on Kubernetes. Use for in-cluster authentication for DSDL tasks. |
Service types
Choose how DSDL services such as notebooks and API endpoints are exposed within Kubernetes:
Service type | Details | When to use |
---|---|---|
LoadBalancer | Specify Namespace and StorageClass in DSDL configuration. | Use for direct external access on cloud providers such as AWS, Azure, and GCP that support external load balancers. |
NodePort | Provide internal and external hostnames if needed. | Use for internal or test environments where you bind a high port on each node. Use for quick, local testing without an ingress. |
Ingress |
Ingress host pattern. For example |
Use when you want advanced routing, TLS termination, or path-based rules.
An ingress controller must be installed in your cluster. |
Namespace and resource management guidelines
See the following guidelines for namespace and resource management when using Kubernetes:
- Use
dsdl-namespace
to create a new namespace and isolate the DSDL workloads. - Specify the Namespace in the DSDL Configuration page.
- Set resource requests and limits. Ensure DSDL pods have enough CPU and memory if performing large-scale model training.
Storage configuration
Use persistent storage for storing models, logs, and data. Complete the following steps:
- Verify the StorageClass:
kubectl get storageclass
- Specify Storage Class in DSDL Configuration.
- Check that dynamic provisioning is working by creating sample PVCs.
Certificate guidelines
See the following guidelines for certificate management when using Kubernetes:
- Self-signed certificates trigger browser warnings and potential vulnerabilities. For external production, use publicly signed certificates such as Let's Encrypt, DigiCert, or an internal certificate authority (CA).
- Include certificates in your DSDL container. Place
dltk.pem
anddltk.key
in the/dltk/.jupyter/
location or specify a custom path in the DSDL configuration. - Enable Hostname Verification in DSDL. Set "Check Hostname" to "Enabled" in the DSDL setup page.
Firewall considerations
DSDL requires certain ports to communicate with Kubernetes resources:
Component | Description |
---|---|
Kubernetes API | Port 6443. Outbound traffic from Splunk to manage cluster. |
DSDL API | Port 5000 or dynamically generated. Bidirectional traffic for fit , apply , and summary commands.
|
Splunk REST API | Port 8089. If container-based notebooks call back to Splunk. |
Splunk HTTP Event Collector (HEC) | Port 8088 for on-premises or port 443 for Splunk Cloud. Outbound traffic from notebooks and pods to Splunk for logs and results. |
Ensure your firewall rules allow necessary ports, especially for any dynamic assignments in development (DEV) mode. For example Jupyter on port 8888, or TensorBoard on port 6006.
Troubleshoot Kubernetes configuration
Issue | Troubleshoot |
---|---|
Authentication failures | Check that tokens, certificates, or user credentials are valid. Confirm RBAC roles and permissions in your cluster. |
Service exposure problems | Verify correct service type of NodePort, LoadBalancer, or Ingress. Check ingress controller logs or load balancer configuration if external access fails. |
Resource limitations | Pods can be stuck in "Pending" if there is insufficient CPU and memory or a lack of storage. Scale your resources or adjust requests and limits. |
Networking issues | DNS resolution within the cluster might need debugging if Splunk cannot reach container endpoints. Check your cluster's network policy or plugin settings. |
Storage issues | PersistentVolumeClaims (PVCs) can remain in "Pending" if no suitable StorageClass is available.
Review the provisioner logs for errors. |
Configure OpenShift integration for the Splunk App for Data Science and Deep Learning | Splunk App for Data Science and Deep Learning certificate settings and JupyterLab password |
This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.0
Feedback submitted, thanks!