Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

Configure Kubernetes integration for the Splunk App for Data Science and Deep Learning

Integrate the Splunk App for Data Science and Deep Learning (DSDL) with a Kubernetes environment to run data science workloads in a scalable and secure manner. Kubernetes provides container orchestration to manage and deploy containerized applications across a cluster of machines. This integration is suitable for production environments where performance, reliability, and security are critical.

For Kubernetes documentation see https://kubernetes.io/docs/home/.

Prerequisites

The following prerequisites must be met to configure a Kubernetes integration for DSDL:

  • Splunk Enterprise installed and running.
  • Splunk Machine Learning Toolkit (MLTK) and Python for Scientific Computing (PSC) installed on the Splunk Enterprise instance.
  • DSDL installed on the Splunk Enterprise instance.
  • Access to a Kubernetes cluster with appropriate permissions.
  • The Kubernetes command-line tool (kubectl) configured to interact with your Kubernetes cluster.
  • Network connectivity between the Splunk Enterprise instance and the Kubernetes cluster.

Kubernetes configuration guidelines

Consider the following guidelines if you are configuring Kubernetes integration for DSDL:

Guideline Description
Secure authentication Use secure authentication for certificates or bearer tokens with limited RBAC privileges.
Transport layer security (TLS) Ensure the Kubernetes API server and set any external DSDL endpoints to use SSL.
Permissions Assign minimal permissions to manage pods and resources.
Monitor and scale Use Splunk Observability or cluster metrics to watch resource usage and scale as needed.
Component updates Keep Kubernetes, DSDL, and related components updated to their latest and compatible versions to benefit from security fixes and performance improvements.

Set up a Kubernetes cluster

Before integrating with DSDL, set up a Kubernetes cluster that meets the following requirements:

Requirement Details
Kubernetes version Version 1.16 or higher is needed for compatibility with DSDL.
Networking Provide connectivity between Splunk Enterprise and the cluster. Configure network plugins such as Calico and Flannel as needed.
Load balancer or Ingress controller Expose services externally for production use if required.
Persistent storage Configure dynamic PVC provisioning if you plan to store model artifacts or data externally.
Role-based access control (RBAC) DSDL does not automate RBAC creation. You can manually assign RBAC details in the DSDL setup page.

Configuration steps

Complete the following steps to configure a Kubernetes cluster:

  1. Install Kubernetes cluster:
    1. Use kubeadm or a managed provider such as Amazon EKS, Red Hat OpenShift, GKE, or AKS to install the cluster.
    2. Ensure that all nodes communicate with each other and the control plane.
  2. Configure network plugin:
    1. Choose a plugin that matches your cluster's version.
    2. Install using directions from the plugin's documentation.
  3. Set up persistent storage:
    1. Install a storage provisioner such as NFS, Ceph, or AWS EBS.
    2. Create a StorageClass for dynamic provisioning.
    3. Test by creating and binding a sample PersistentVolumeClaim (PVC).
  4. Install ingress controller or load balancer:
    1. Use an ingress controller such as NGINX Ingress Controller or configure a load balancer. For example AWS Elastic Load Balancing (ELB).
    2. Enable SSL/TLS termination if you need secure external access.
  5. Verify cluster functionality:
    1. Deploy a simple test application.
    2. Confirm that services and the ingress controller or load balancer configurations work as expected.

Configure DSDL for Kubernetes

Once the Kubernetes cluster is set up, you can configure DSDL in Splunk Enterprise to deploy and manage your containerized data science workloads:

  1. In DSDL, go to Configuration and then Setup.
  2. Select Kubernetes and enter your cluster details.
  3. Choose the Service type.
  4. Provide a hostname. For example dsdl.apps.<cluster-domain> if you want a custom route.
  5. Test and save: DSDL will attempt to deploy containers in your Kubernetes project.

Automatic deployment

After saving the necessary details on the DSDL setup page in Splunk Enterprise, DSDL automatically triggers the deployment of its containers and any required Kubernetes resources. This includes creating pods, persistent volumes, and services according to your configuration.

Authentication modes

DSDL supports multiple authentication methods for connecting to your Kubernetes cluster. Choose the mode that best suits your environment and security requirements:

Authentication mode Details When to use
Certificate and Key
  • Cluster base URL: https://api.<cluster-domain>:6443
  • Cluster certificate authority: The CA certificate path that signed the Kubernetes server's certificate.
  • Client certificate or client key: Paths to your client certificate and private key.

Obtain certificates from a trusted CA rather than self-signing certificates.

Use for high-security environments where you have properly signed certificates. Use if you prefer mutual TLS over other mechanisms.
User Token Use a bearer token associated with a Kubernetes service account:
  • Cluster base URL: https://api.<cluster-domain>:6443
  • User token: The bearer token for a Kubernetes service account.
  • Steps: Create the service account, bind appropriate roles, and retrieve the token using kubectl.
Use when you want a simple setup without managing certificates. Service accounts can have minimal or limited permissions through RBAC.
User Login Use a username and password for basic authentication:
  • Cluster base URL: https://api.<cluster-domain>:6443
  • User name and password: Credentials for basic authorization.
  • (Optional) CA certificate: Required if you need TLS.
Use for simple testing or development. Not suitable for production due to weaker security.
Service Account (In-Cluster) Use a service account automatically when Splunk Enterprise runs inside the same Kubernetes cluster:
  • Cluster base URL: Might be auto-discovered if in-cluster.
  • Namespace: The namespace containing the service account.
Use if Splunk Enterprise is itself deployed on Kubernetes. Use for in-cluster authentication for DSDL tasks.

Service types

Choose how DSDL services such as notebooks and API endpoints are exposed within Kubernetes:

Service type Details When to use
LoadBalancer Specify Namespace and StorageClass in DSDL configuration. Use for direct external access on cloud providers such as AWS, Azure, and GCP that support external load balancers.
NodePort Provide internal and external hostnames if needed. Use for internal or test environments where you bind a high port on each node. Use for quick, local testing without an ingress.
Ingress

Ingress host pattern. For example *.example.com.
Annotations: Custom Ingress settings.

Use when you want advanced routing, TLS termination, or path-based rules.

An ingress controller must be installed in your cluster.

Namespace and resource management guidelines

See the following guidelines for namespace and resource management when using Kubernetes:

  • Use dsdl-namespace to create a new namespace and isolate the DSDL workloads.
  • Specify the Namespace in the DSDL Configuration page.
  • Set resource requests and limits. Ensure DSDL pods have enough CPU and memory if performing large-scale model training.

Storage configuration

Use persistent storage for storing models, logs, and data. Complete the following steps:

  1. Verify the StorageClass:
    kubectl get storageclass
  2. Specify Storage Class in DSDL Configuration.
  3. Check that dynamic provisioning is working by creating sample PVCs.

Certificate guidelines

See the following guidelines for certificate management when using Kubernetes:

  • Self-signed certificates trigger browser warnings and potential vulnerabilities. For external production, use publicly signed certificates such as Let's Encrypt, DigiCert, or an internal certificate authority (CA).
  • Include certificates in your DSDL container. Place dltk.pem and dltk.key in the /dltk/.jupyter/ location or specify a custom path in the DSDL configuration.
  • Enable Hostname Verification in DSDL. Set "Check Hostname" to "Enabled" in the DSDL setup page.

Firewall considerations

DSDL requires certain ports to communicate with Kubernetes resources:

Component Description
Kubernetes API Port 6443. Outbound traffic from Splunk to manage cluster.
DSDL API Port 5000 or dynamically generated. Bidirectional traffic for fit, apply, and summary commands.
Splunk REST API Port 8089. If container-based notebooks call back to Splunk.
Splunk HTTP Event Collector (HEC) Port 8088 for on-premises or port 443 for Splunk Cloud. Outbound traffic from notebooks and pods to Splunk for logs and results.

Ensure your firewall rules allow necessary ports, especially for any dynamic assignments in development (DEV) mode. For example Jupyter on port 8888, or TensorBoard on port 6006.

Troubleshoot Kubernetes configuration

Issue Troubleshoot
Authentication failures Check that tokens, certificates, or user credentials are valid. Confirm RBAC roles and permissions in your cluster.
Service exposure problems Verify correct service type of NodePort, LoadBalancer, or Ingress. Check ingress controller logs or load balancer configuration if external access fails.
Resource limitations Pods can be stuck in "Pending" if there is insufficient CPU and memory or a lack of storage.

Scale your resources or adjust requests and limits.

Networking issues DNS resolution within the cluster might need debugging if Splunk cannot reach container endpoints. Check your cluster's network policy or plugin settings.
Storage issues PersistentVolumeClaims (PVCs) can remain in "Pending" if no suitable StorageClass is available.

Review the provisioner logs for errors.

Last modified on 29 January, 2025
Configure OpenShift integration for the Splunk App for Data Science and Deep Learning   Splunk App for Data Science and Deep Learning certificate settings and JupyterLab password

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters