All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.
Troubleshoot your Splunk Data Stream Processor deployment
Use this information to troubleshoot issues relating to the Splunk Data Stream Processor (DSP) installation and deployment.
Support
To report bugs or receive additional support, do the following:
- Ask questions and get answers through community support at Splunk Answers.
- If you have a support contract, file a case using the Splunk Support Portal. See Support and Services.
- If you have a support contract, contact Splunk Customer Support.
- To get professional help with optimizing your Splunk software investment, see Splunk Services.
When contacting Splunk Customer Support, provide the following information:
Information to provide | Notes |
---|---|
Pipeline ID | To view the ID of a pipeline, open the pipeline in DSP, then click the pipeline options icon () and select Update pipeline metadata. |
Pipeline name | N/A |
DSP version | To view your DSP version, in the product UI, click the More Options icon () and select About. |
DSP diagnostic report | A DSP diagnostic report contains all DSP application logs as well as system and monitoring logs.
The command creates a diagnostic report named dsp-report-<timestamp>.tar.gz in the working directory. |
Summary of the problem and any additional relevant information | N/A |
[ERROR]: cannot allocate memory
DSP on RHEL7/CENTOS7 fails with a warning message similar to the following:
Warning FailedCreatePodContainer 5s (x2 over 16s) kubelet, 10.234.0.181 unable to ensure pod container exists: failed to create container for [kubepods burstable poded9bd025-c3e4-4ebb-a5b7-2a7adab9742d] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/poded9bd025-c3e4-4ebb-a5b7-2a7adab9742d: cannot allocate memory
Cause
This warning is caused by a bug in the older RHEL7/CENTOS7 kernels such as v3.10.0-1127.19.1.el7 in combination with systemd v 231 or earlier where kernel-memory cgroups are not cleaned up properly. This manifests as a memory allocation error when new pods are created. For more information see the Kubernetes bug report: Kubelet CPU/Memory Usage linearly increases using CronJob.
Solution
Upgrade systemd to v232 or later, or disable kernel memory accounting by setting cgroup.memory=nokmem
.
Do the following steps to disable kernel memory accounting:
- Find the kernel version.
grubby --default-kernel
- Disable the kernel memory accounting by adding
nokmem
to the kernel boot parameters.grubby --args=cgroup.memory=nokmem --update-kernel /boot/<kernel_version> - Reboot the host.
[ERROR]: waiting for agents to join:
You may see this error while running the installer.
Cause
The DSP installer waits ten minutes for nodes to join your cluster. If your cluster does not have a minimum of 3 nodes and ten minutes have elapsed, then the installer times out.
Solution
You must remove all nodes from k0s and start the installation process again.
- On the controller node, run
sudo ./dsp leave --confirm
to force the node to leave the k0s cluster. - Make sure that you have all three nodes prepared, and then start the installation process over again.
[ERROR]: The following pre-flight checks failed:
The DSP installer fails to complete because of pre-flight checks.
Cause
The DSP installer runs pre-flight checks to make sure that your system meets the minimum requirements required for DSP. If your system does not meet the minimum requirements necessary for DSP, the installer quits installation.
Solution
The installer returns which pre-flight checks failed. Using that information, double-check that you meet the mandatory Hardware and Software requirements for DSP. See Hardware and Software Requirements.
[ERROR]: The following pre-flight checks failed: XXGB available space left on /var/data, minimum of 175GB is required
The DSP installer fails to complete because there isn't enough space left on /var/data
even if another disk volume or partition is specified with --location
.
Cause
The DSP installer runs a pre-flight check to make sure that your system has enough drive space on /var/data
even if you have used --location
to install DSP on another drive or partition. If there isn't enough disk space on /var
, the pre-flight check fails with a disk space error.
Solution
Add a symlink from your intended install location to /var/data
. For example, if you want to use --location /data
, then add the following symlink.
ln -s /data /var/data
The DSP installer fails to complete due to clocks being out of sync
During DSP installation, the console returns the following error message.
Operation failure: servers ip-10-216-29-75 and ip-10-216-29-6 clocks are out of sync: Fri Sep 11 22:23:01.863 UTC and Fri Sep 11 22:23:02.562 UTC respectively, sync the times on servers before install, e.g. using ntp
Cause
The time difference between servers is greater than 300 milliseconds.
Solution
Synchronize the system clocks on each node. For most environments, Network Time Protocol (NTP) is the best approach. Consult the system documentation for the particular operating systems on which you are running the Splunk Data Stream Processor. If you are running DSP on an AWS EC2 environment, see "Setting the time for your Linux instance" in the Amazon Web Services documentation. If you are running DSP on a different environment, see "NTP" in the Debian documentation or the Chrony documentation.
Network bridge driver loading issues
Depending on the system configuration, network bridge drivers may not be loaded. If they are not loaded, the install fails at the /health
phase. See installation checklist.
Installation failure due to disabled Network Bridge Driver
The installation fails with the following error message:
[ERROR]: failed to execute phase "/health" planet is not running yet: &{degraded [{ 10.216.31.29 master degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.218 master degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.252 master healthy []}]} (planet is not running yet: &{degraded [{ 10.216.31.29 master degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.218 master degraded [kubernetes requires net.bridge.bridge-nf-call-iptables sysctl set to 1, https://www.gravitational.com/docs/faq/#bridge-driver]} { 10.216.31.252 master healthy []}]})
You must do the following on each node:
sysctl -w net.bridge.bridge-nf-call-iptables=1
echo net.bridge.bridge-nf-call-iptables=1 >> /etc/sysctl.d/10-bridge-nf-call-iptables.conf
Then, restart the installation process.
Unable to log in to the Splunk Cloud Services CLI
Logging in to the Splunk Cloud Services CLI results in a failed to get session token: failed to get valid response from csrfToken endpoint: Get "https://<ip_addr>/csrfToken": x509: cannot validate certificate for <ip> because it doesn't contain any IP SANs
error.
Cause
The Splunk Cloud Services CLI configuration file is incorrectly configured.
Solution
Make sure that your Splunk Cloud Services CLI settings are configured correctly, given the particular version of the Splunk Cloud Services CLI that you are using. See Configure the Splunk Cloud Services CLI.
DSP UI times out
The DSP UI appears to find the controller node but fails to load.
Cause
The controller node's IP address has been changed, and the DSP UI is trying to redirect your browser to a private IP. Such IP reassignments are common with various public cloud providers when servers are stopped.
Solution
Reconfigure the DSP UI redirect URL. See Configure the Data Stream Processor UI redirect URL.
My data is not making it into my pipeline
If data is not making it into your activated pipelines, check to see whether all the ingestion services are running in Kubernetes.
Cause
One of the ingestion services could be down.
Solution
Make sure that all the ingest services are running. The ingest services are: ingest-hec
, ingest-s2s
, and splunk-streaming-rest
.
kubectl get pods -n dsp
Use the Splunk App for DSP to monitor your DSP deployment |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6
Feedback submitted, thanks!