Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

Configure the Splunk App for Data Science and Deep Learning

The Splunk App for Data Science and Deep Learning (DSDL) extends the Splunk platform to enable advanced analytics, machine learning, and deep learning capabilities by integrating with external containerized environments.

Before you can begin using DSDL you must set up at least one container environment. DSDL requires this external containerized environment to execute resource-intensive computations, such as model training and inference. You can choose Docker or Kubernetes based on your performance and security needs:

You can also choose to set up the Splunk HTTP Event Collector (HEC) and the Splunk access token:

Prerequisites

Before configuring DSDL both the Splunk Machine Learning Toolkit (MLTK) and the Python for Scientific Computing (PSC) add-on must be installed on your Splunk search head. See Install the Splunk App for Data Science and Deep Learning.

Configuration guidelines

Consider the following guidelines when making configuration changes to DSDL:

Item Details
SSL and TLS certificates
  • For production environments, provide your own certificates.
  • Configure certificate paths in the DSDL setup page for HTTPS connections.
Endpoint tokens and passwords
  • Set custom tokens and passwords for container endpoints such as API and Jupyter..
  • Rotate credentials regularly.
Access controls
  • Limit Splunk access and HEC tokens to necessary indexes and roles.
  • For Kubernetes, use Role-Based Access Control (RBAC) to manage permissions and enforce network policies.

Docker configuration

If you have a single-instance Splunk deployment use Docker for development and testing purposes. Docker is ideal for scenarios where it runs side by side with the Splunk search head on the same machine.

Docker limitations

Consider the following limitations if choosing to use Docker:

  • Security: Docker integration does not support Transport Layer Security (TLS) which might not meet security requirements for production environments.
  • Scalability: Docker is less suitable for large-scale or production workloads as compared to Kubernetes.

Configuration steps

Complete the following steps to configure a connection to Docker:

  1. Set up Docker environment. Install Docker on the host machine where the Splunk search head is running.

    Make sure that the Docker daemon is running and accessible.

  2. Configure Docker settings in Splunk:
    Setting Details
    Docker host

    Linux, same machine: unix://var/run/docker.sock
    Windows or TCP access: tcp://localhost:2375
    Remote Docker daemon: tcp://remote.host.com:2375

    Endpoint URL The hostname or IP where containers will be accessible. For example localhost.
    External URL If different from the endpoint URL, specify how containers are accessed externally.
    Security Communication is unencrypted. Limit Docker to trusted or local environments.


Kubernetes configuration

Choose Kubernetes for production environments where scalability, high availability, and security are critical. Kubernetes allows you to orchestrate containers across multiple machines, providing resource utilization and reliability.

Kubernetes features

Using Kubernetes offers the following features:

  • Scalability: Scale resources on demand.
  • Security: Supports TLS communication and fine-grained access controls.
  • Flexibility: Compatible with various on-premises or cloud providers including EKS, OpenShift, GKE, and AKS.


Configuration steps

Complete the following steps to configure a connection to Kubernetes:

  1. Set up a Kubernetes cluster. You can deploy a Kubernetes cluster using your preferred platform. Options include but are not limited to Amazon Elastic Kubernetes Service (EKS), Red Hat OpenShift, Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), or your own on-premises deployment.

    Ensure that the cluster is accessible and properly configured for your environment.

  2. Configure Kubernetes settings in Splunk:
    Setting Description
    Authentication Mode Cert & Key, User Token, AWS IAM, or other methods as supported by your cluster.

    Cluster Base URL

    Typically https://<api-server-host>:6443.
    Credentials Provide token or cert/key pair for secure communication.
    Service Type LoadBalancer, NodePort, Route (OpenShift), or Ingress to expose DSDL containers.
    Namespace Specify the Kubernetes namespace for DSDL container deployments.
  3. Container Deployment:
    1. Use provided Kubernetes manifests or Helm charts to deploy DSDL containers in your cluster.
    2. Adjust CPU and GPU resource limits, storage, and networking to match your requirements.
  4. Security:
    1. Enable TLS for secure communications.
    2. Use Role-Based Access Control (RBAC) to control permissions and network policies to restrict traffic.


Test and troubleshoot the Docker or Kubernetes configuration

After completing the setup, test the connection between Splunk and the external container environment.

  1. On the DSDL Configuration page, select Test & Save to validate connectivity with Docker or Kubernetes.
  2. Verify that DSDL can communicate with the external environment:
    1. Pull data: In a Jupyter Notebook, use SplunkSearch.SplunkSearch() to searchdata from Splunk.
    2. Push data: Use | fit MLTKContainer mode=stage ... in Splunk to send datasets to the container environment.

Troubleshoot the configuration

Try the following if you are experiencing issues with the configuration:

  • Check network connectivity and firewall settings.
  • Review Splunk logs and container logs for errors.
  • Verify that all tokens and credentials are correctly entered.

Configure HTTP Event Collector

Splunk HTTP Event Collector (HEC) allows external DSDL containers to send inference results and logs back to the Splunk platform. For more information on HEC see Set up and use HTTP Event Collector in Splunk Web in the Splunk Enterprise manual.

HEC security considerations

Review the following security considerations if choosing to use HEC:

  • SSL and TLS: Set up SSL and TLS for HEC to secure data transmission.
  • Token permissions: Restrict the HEC token to necessary indexes and source types.
  • Firewall settings: Ensure that the HEC port is properly secured and not exposed to untrusted networks.

HEC configuration steps

Complete the following configuration steps:

  1. Enable HEC in Splunk:
    1. Go to Settings, then Data Inputs, and then HTTP Event Collector.
    2. Set All Tokens to Enabled.
    3. (Optional) If required, also set SSL to Enabled.
  2. Create a new HEC token:
    1. Provide a name, select a source type, and specify an index.
    2. Copy the generated token value.
  3. Configure DSDL for HEC:
    1. In the DSDL configuration, enable Splunk HEC.
    2. Provide the HEC token and endpoint URL. For example https://<splunk-host>:8088.


Configure Splunk access token

The Splunk access token allows JupyterLab Notebooks or other container environments to connect to Splunk using the Splunk REST API, supporting interactive data pulls or staging commands.

Access token security considerations

Review the following security considerations if choosing to use Splunk access token:

  • Token permissions: Assign minimal necessary permissions to the API token.
  • Secure storage: Use environment variables or secure methods to store tokens in notebooks.

Access token configuration steps

Complete the following steps:

  1. Create an API token in Splunk:
    1. Go to Settings, then Tokens, and then select Create New Token.
    2. Grant appropriate permissions and copy the generated token.
  2. Configure DSDL:
    1. In the DSDL setup page, enable Splunk Access for Jupyter.
    2. Provide the Splunk Access Token, host address, and management port. An example host address is host.docker.internal. The default management port is 8089.
Last modified on 24 January, 2025
Install or upgrade the Splunk App for Data Science and Deep Learning   Install and configure the Splunk App for Data Science and Deep Learning in an air-gapped environment

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters