Splunk® App for Data Science and Deep Learning

Use the Splunk App for Data Science and Deep Learning

Firewall requirements for Splunk and Kubernetes communication

When the Splunk App for Data Science and Deep Learning (DSDL) integrates with Kubernetes, you must ensure that the necessary ports are open to allow communication between the Splunk search head, the Kubernetes environment, and DSDL services. These firewall rules help data flow securely and efficiently, especially when containers are dynamically assigned ports at runtime in development (DEV) mode.

In production (PROD) mode, fewer ports are exposed, typically limiting external access to only the required endpoints. In DEV mode, additional ports for Jupyter, TensorBoard, MLflow, and Spark can be opened for interactive development.

Kubernetes firewall rules summary

Use this summary to confirm required, recommended, or optional ports. Adjust for your specific environment and security policies:

Component Required? Port Description
Kubernetes API Yes 6443 Required for Splunk to manage pods.
Splunk REST API No 8089 Optional if container-based calls to Splunk are needed.
DSDL API Yes 5000 or dynamic Required for training and inference commands.
Splunk HEC No 443 or 8088 Optional if streaming data or logs back into Splunk.
Jupyter No 8888 Used in DEV mode or specific workflows; open only if needed.
MLflow No 6060 Used in DEV mode or specific workflows; open only if needed.
Spark No 4040 Used in DEV mode or specific workflows; open only if needed.
TensorBoard No 6006 Used in DEV mode or specific workflows; open only if needed.

Firewall configuration for the Splunk search head

See the following table for information on traffic direction and port requirements for the Splunk search head:

Traffic direction Port Required? Description
Outbound 6443 Required for Kubernetes use. Kubernetes API server. Manage pods, resources.
Bidirectional 8089 Optional. Needed if containers call back to Splunk using REST. Splunk REST API communication with containers.
Bidirectional 5000 or dynamic Required for DSDL operations. DSDL API commands including fit, apply, and summary.
Inbound 8088 for on-premises or 443 for Splunk Cloud. Optional if using HEC ingestion. Splunk HEC for receiving data from containers.

See the following table for further information on Splunk ports:

Port Description
Kubernetes API port
  • Splunk needs outbound access to port 6443 to manage Kubernetes resources.
  • Firewall rule: Outbound traffic on port 6443 to the Kubernetes API server.
Splunk management port
  • If containerized services in Kubernetes need to communicate with the Splunk REST API, open port 8089 in both directions.
  • Firewall rule: Bidirectional traffic on port 8089 if you enable Splunk's REST API calls from containers.
DSDL API port
  • By default, DSDL uses port 5000 for data science operations in DEV mode. In PROD mode, the DSDL container typically runs on port 5000 but can dynamically assign a different port.
  • Firewall rule: Bidirectional traffic on 5000 or the dynamically assigned port.
Splunk HEC port
  • For inbound traffic from Kubernetes (notebooks, inference scripts) to Splunk:
    • On-premises Splunk often uses 8088.
    • Splunk Cloud typically uses 443.
  • Firewall rule: Inbound traffic on the relevant port (8088 or 443).

Firewall configuration for the machine running Kubernetes

See the following table for information on traffic direction and port requirements for the machine running Kubernetes:

Traffic direction Port Required? Description
Bidirectional 8089 Optional. Use if needed for container communication to the Splunk REST API. REST API communication with Splunk.
Bidirectional 5000 or dynamic Required for DSDL operations. DSDL API commands with Splunk.
Outbound 443 for Splunk Cloud or 8088 for Splunk on-premises. Optional. Use for HEC ingestion. HEC for sending data to Splunk.
Inbound 6443 Required to manage cluster resources. Kubernetes API access from Splunk.
Inbound 8888 or dynamic Required in DEV if Jupyter is used. Jupyter Notebooks (DEV mode).
Inbound 6060 or dynamic Optional. Use with MLflow. MLflow tracking (DEV mode).
Inbound 4040 or dynamic Optional. Use with Spark. Spark monitoring (DEV mode).
Inbound 6006 or dynamic Optional. Use with TensorBoard. TensorBoard (DEV mode).

See the following table for further information on both Splunk and service ports:

Port Description
Splunk management port
  • If containers require Splunk REST API access, keep 8089 open in both directions.
  • Firewall rule: Bidirectional traffic on port 8089.
DSDL API port
  • The container typically communicates with Splunk using port 5000. If you enable dynamic port assignment, open the assigned port.
  • Firewall rule: Bidirectional traffic on port 5000 or the chosen dynamic port.
Splunk HEC port
  • Containers in Kubernetes need outbound access to Splunk HEC if they send data or logs back to Splunk.
  • Firewall rule: Outbound traffic on port 443 for Splunk Cloud or port 8088 for Splunk on-premises.
Kubernetes API port
  • Splunk uses inbound access to manage or query Kubernetes resources using port 6443.
  • Firewall rule: Inbound traffic on port 6443 from the Splunk Search Head.
Optional Services (DEV Mode)
  • Jupyter: Port 8888. Inbound if you use Jupyter notebooks.
  • MLflow: Port 6060. Inbound if MLflow experiment tracking is enabled.
  • Spark: Port 4040. Inbound if Spark monitoring is used.
  • TensorBoard: Port 6006. Inbound if deep learning visualization is needed.
  • Firewall rule: Inbound traffic on these ports or the dynamically assigned ones.

Development versus production usage

See the following table for firewall usage in development (DEV) versus production (PROD) mode:

Mode Description
Development (DEV) mode
  • Containers might expose multiple ports for Jupyter, MLflow, Spark, and TensorBoard.
  • The Splunk search head or the Kubernetes environment will dynamically assign ports, which appear in the DSDL UI.
  • These ports are convenient for interactive development and debugging.
Production (PROD) mode
  • Typically limits exposed ports to only the DSDL API. Port 5000 or a single dynamic port.
  • Jupyter, TensorBoard, MLflow, and Spark interfaces are disabled or not exposed.
  • This reduces the attack surface and simplifies firewall management.

Next steps

Complete the following steps after completing the steps in the previous sections:

  1. Align ports with your environment:
    1. Verify which ports your containers actually use by checking the DSDL user interface or logs.
    2. Update firewall rules to cover both default and dynamically assigned ports.
  2. Implement network security best practices:
    1. Restrict ports to trusted networks where possible.
    2. Enable TLS (especially for Kubernetes in production) to safeguard data in transit.
    3. Limit the container's inbound connectivity if you only need to push data out to Splunk.
  3. Monitor and test:
    1. After configuring your firewall, run test searches, attempt container connections, and confirm that data can flow in both directions as needed.
    2. For ongoing monitoring, leverage Splunk Observability, container logs, and standard Splunk dashboards to ensure stable communication.
Last modified on 24 January, 2025
Firewall requirements for Splunk and Docker communication   Leverage the examples provided in the Splunk App for Data Science and Deep Learning

This documentation applies to the following versions of Splunk® App for Data Science and Deep Learning: 5.2.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters