Troubleshoot your Splunk Data Stream Processor deployment

Use this information to troubleshoot issues relating to the Splunk Data Stream Processor (DSP) installation and deployment.

Support

To report bugs or receive additional support, do the following:

Ask questions and get answers through community support at Splunk Answers.
If you have a support contract, file a case using the Splunk Support Portal. See Support and Services.
If you have a support contract, contact Splunk Customer Support.
To get professional help with optimizing your Splunk software investment, see Splunk Services.

When contacting Splunk Customer Support, provide the following information:

Information to provide	Notes
Pipeline ID	To view the ID of a pipeline, open the pipeline in DSP, then click the pipeline options icon () and select Update pipeline metadata.
Pipeline name	N/A
DSP version	To view your DSP version, in the product UI, click the More Options icon () and select About.
DSP diagnostic report	A DSP diagnostic report contains all DSP application logs as well as system and monitoring logs. To generate this report, do the following: Navigate to the working directory of a DSP controller node. To identify the controller nodes in your DSP cluster, run the `./dsp status nodes` command and check the `ROLE` section of the returned information. By default, the name of the working directory is dsp-<version>-linux-amd64, where <version> is the DSP version that you're running. Run the following command: `sudo dsp report` The command creates a diagnostic report named dsp-report-<timestamp>.tar.gz in the working directory.
Summary of the problem and any additional relevant information	N/A

[ERROR]: cannot allocate memory

DSP on RHEL7/CENTOS7 fails with a warning message similar to the following:

Warning FailedCreatePodContainer 5s (x2 over 16s) kubelet, 10.234.0.181 unable to ensure pod container exists: failed to create container for [kubepods burstable poded9bd025-c3e4-4ebb-a5b7-2a7adab9742d] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/poded9bd025-c3e4-4ebb-a5b7-2a7adab9742d: cannot allocate memory

Cause

This warning is caused by a bug in the older RHEL7/CENTOS7 kernels such as v3.10.0-1127.19.1.el7 in combination with systemd v 231 or earlier where kernel-memory cgroups are not cleaned up properly. This manifests as a memory allocation error when new pods are created. For more information see the Kubernetes bug report: Kubelet CPU/Memory Usage linearly increases using CronJob.

Solution

Upgrade systemd to v232 or later, or disable kernel memory accounting by setting cgroup.memory=nokmem.

Do the following steps to disable kernel memory accounting:

Find the kernel version.
```
grubby --default-kernel
```
Disable the kernel memory accounting by adding nokmem to the kernel boot parameters.
grubby --args=cgroup.memory=nokmem --update-kernel /boot/<kernel_version>
Reboot the host.

[ERROR]: waiting for agents to join:

You may see this error while running the installer.

Cause

The DSP installer waits ten minutes for nodes to join your cluster. If your cluster does not have a minimum of 3 nodes and ten minutes have elapsed, then the installer times out.

Solution

You must remove all nodes from k0s and start the installation process again.

On the controller node, run sudo ./dsp leave --confirm to force the node to leave the k0s cluster.
Make sure that you have all three nodes prepared, and then start the installation process over again.

[ERROR]: The following pre-flight checks failed:

The DSP installer fails to complete because of pre-flight checks.

Cause

The DSP installer runs pre-flight checks to make sure that your system meets the minimum requirements required for DSP. If your system does not meet the minimum requirements necessary for DSP, the installer quits installation.

Solution

The installer returns which pre-flight checks failed. Using that information, double-check that you meet the mandatory Hardware and Software requirements for DSP. See Hardware and Software Requirements.

[ERROR]: The following pre-flight checks failed: XXGB available space left on /var/data, minimum of 175GB is required

The DSP installer fails to complete because there isn't enough space left on /var/data even if another disk volume or partition is specified with --location.

Cause

The DSP installer runs a pre-flight check to make sure that your system has enough drive space on /var/data even if you have used --location to install DSP on another drive or partition. If there isn't enough disk space on /var, the pre-flight check fails with a disk space error.

Solution

Add a symlink from your intended install location to /var/data. For example, if you want to use --location /data, then add the following symlink.

ln -s /data /var/data

The DSP installer fails to complete due to clocks being out of sync

During DSP installation, the console returns the following error message.

Operation failure: servers ip-10-216-29-75 and ip-10-216-29-6 clocks are out of sync: Fri Sep 11 22:23:01.863 UTC and Fri Sep 11 22:23:02.562 UTC respectively, sync the times on servers before install, e.g. using ntp

Cause

The time difference between servers is greater than 300 milliseconds.

Solution

Synchronize the system clocks on each node. For most environments, Network Time Protocol (NTP) is the best approach. Consult the system documentation for the particular operating systems on which you are running the Splunk Data Stream Processor. If you are running DSP on an AWS EC2 environment, see "Setting the time for your Linux instance" in the Amazon Web Services documentation. If you are running DSP on a different environment, see "NTP" in the Debian documentation or the Chrony documentation.

Network bridge driver loading issues

Depending on the system configuration, network bridge drivers may not be loaded. If they are not loaded, the install fails at the /health phase. See installation checklist.

Installation failure due to disabled Network Bridge Driver

The installation fails with the following error message:

[ERROR]: failed to execute phase "/health" planet is not running yet: &{degraded [{ 10.216.31.29 master  degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.218 master  degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.252 master  healthy []}]} (planet is not running yet: &{degraded [{ 10.216.31.29 master  degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.218 master  degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.252 master  healthy []}]})

You must do the following on each node:

sysctl -w net.bridge.bridge-nf-call-iptables=1

echo net.bridge.bridge-nf-call-iptables=1 >> /etc/sysctl.d/10-bridge-nf-call-iptables.conf

Then, restart the installation process.

Unable to log in to the Splunk Cloud Services CLI

Logging in to the Splunk Cloud Services CLI results in a failed to get session token: failed to get valid response from csrfToken endpoint: Get "https://<ip_addr>/csrfToken": x509: cannot validate certificate for <ip> because it doesn't contain any IP SANs error.

Cause

The Splunk Cloud Services CLI configuration file is incorrectly configured.

Solution

Make sure that your Splunk Cloud Services CLI settings are configured correctly, given the particular version of the Splunk Cloud Services CLI that you are using. See Configure the Splunk Cloud Services CLI.

DSP UI times out

The DSP UI appears to find the controller node but fails to load.

Cause

The controller node's IP address has been changed, and the DSP UI is trying to redirect your browser to a private IP. Such IP reassignments are common with various public cloud providers when servers are stopped.

Solution

Reconfigure the DSP UI redirect URL. See Configure the Data Stream Processor UI redirect URL.

My data is not making it into my pipeline

If data is not making it into your activated pipelines, check to see whether all the ingestion services are running in Kubernetes.

Cause

One of the ingestion services could be down.

Solution

Make sure that all the ingest services are running. The ingest services are: ingest-hec, ingest-s2s, and splunk-streaming-rest.

kubectl get pods -n dsp

Related answers from Splunk Community

Troubleshoot your Splunk Data Stream Processor deployment

Support

[ERROR]: cannot allocate memory

Cause

Solution

[ERROR]: waiting for agents to join:

Cause

Solution

[ERROR]: The following pre-flight checks failed:

Cause

Solution

[ERROR]: The following pre-flight checks failed: XXGB available space left on /var/data, minimum of 175GB is required

Cause

Solution

The DSP installer fails to complete due to clocks being out of sync

Cause

Solution

Network bridge driver loading issues

Installation failure due to disabled Network Bridge Driver

Unable to log in to the Splunk Cloud Services CLI

Cause

Solution

DSP UI times out

Cause

Solution

My data is not making it into my pipeline

Cause

Solution

Comments

Troubleshoot your Splunk Data Stream Processor deployment

Was this topic useful?