Install the Data Stream Processor
To install the Splunk Data Stream Processor, download, extract, and run the installer on each node in your cluster. You must contact your Splunk representative to access the Data Stream Processor download page. The Data Stream Processor is installed from a Gravity package, which spins up a group of cloud instances and hardware into a cluster and deploys Kubernetes on top of it. All Data Stream Processor services gets deployed on top of Kubernetes, see the Gravity documentation for more information.
Extract and run the Data Stream Processor installer
Do the following steps to extract and run the DSP installer.
Prerequisites
- Before you install the Splunk Data Stream Processor, make sure that your system clocks are synchronized on each node. Consult the system documentation for the particular operating systems on which you are running the Splunk Data Stream Processor. For most environments, Network Time Protocol (NTP) is the best approach.
- You must have IPv4 Forwarding enabled on each node. See IPv4 Forwarding in the Gravity documentation.
- You must have system administrator (root) permissions for the installation of Kubernetes and Gravity. Kubernetes leverages system components like iptables and kernel modules which require root access. If you do not have root permissions, you can use the
sudo
command. Once installed, non-privileged DSP containers and services do not run as the root user, but rather as a service user that you specify. See Service User in the Gravity documentation and step 5 below for details on how to specify the service user.
Steps
- Download the Data Stream Processor installer on each node in your cluster.
- On each node in your cluster, extract the Data Stream Processor installer from the tarball.
tar xf <dsp-version>-linux-amd64.tar
- On the node that you want to be the master node, navigate to the extracted file.
cd <dsp-version>
- If you are installing DSP with SELinux enabled, temporarily disable SELinux in your Linux OS. This disables SELinux until your Linux server is rebooted.
setenforce 0
- From the extracted file directory, run the DSP installer command. You must run this command with the
--flavor=ha
flag, but the install command supports several optional flags as well:./install [--optional-flags] --flavor=ha
Flag Description --accept-license Automatically accepts the license agreement printed upon completion. --token <token> A secure token preventing rogue nodes from joining the cluster during installation. Your token must be at least six characters long. --service-uid <numeric> Specifies the Service User ID. For information about how this is used, see Service User in the Gravity documentation. If not specified, a user named planet
is created with user id1000
.
Note: The./join
command does not support the--service-uid
or--service-gid
flags, but instead, the worker nodes use whatever value is set on the master node with./install
.--service-gid <numeric> Specifies the Service Group ID. For information about how this is used, see Service User in the Gravity documentation. If not specified, a group named planet
is created.
Note: The./join
command does not support the--service-uid
or--service-gid
flags, but instead, the worker nodes use whatever value is set on the master node with./install
.--pod-network-cidr <10.244.0.0/16> The CIDR range Kubernetes will be allocating node subnets and pod IPs from. Must be a minimum of /16
so Kubernetes is able to allocate/24
to each node. If not specified, defaults to10.244.0.0/16
.--service-cidr <10.100.0.0/16> The CIDR range Kubernetes will be allocating service IPs from. If not specified, defaults to 10.100.0.0/16
.--mount=data:/data/gravity/pv --state-dir=/data/gravity Change the location where Gravity stores containers and state information. Defaults to /var/lib/gravity
. This example stores data on the/data
directory. Use this command if you do not have enough disk space in/var
to support 24 hours of data retention.This may take up to 15 minutes to install. Keep this terminal window open.
- If you are installing DSP on a CentOS or RHEL Operating System, to prevent Gravity hosts from running out of inotify watches, increase
fs.inotify.max_user_watches
to1000000
located in/etc/sysctl.d/99-sysctl.conf
. If your Gravity hosts run out of inotify watches, Gravity throws an out of disk error.- On each node, open the
99-sysctl.conf
file in/etc/sysctl.d/
- Add the following line to the file:
fs.inotify.max_user_watches=1000000
- Save your changes.
- From the command-line of the master node, type the following command:
sysctl -p /etc/sysctl.d/99-sysctl.conf
.
- On each node, open the
Open the required Gravity Ports and finish the install
Gravity is a toolkit that allows developers to package their Kubernetes clusters and apps as a tarball. All Data Stream Processor services are deployed on top of Kubernetes. Complete the following steps to open the requisite Gravity ports and complete the DSP installation.
- Open the required ports that are required by the Data Stream Processor.
- After the installation process has finished, the installer prints out a command that you must use to join the other nodes to the first master node. Copy the text after
gravity join
.Wed Oct 2 23:59:56 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- worker 2 gravity join <ip-address-master> --token=<token> --role=worker
- On each of the worker nodes, enter the following.
./join <ip-address-of-master> --token=<token> --role=worker
When a minimum of two nodes have joined your cluster, the install continues and the following things occur:
- Checks that the system is running on a supported OS.
- Checks that the system passes pre-installation checks such as meeting the minimum system requirements.
- Checks that the system is not already running docker or other conflicting software
- Checks that the system has the necessary running services and kernel modules
- Installs Docker, Kubernetes, and other software dependencies like SCloud
- Prepares Kubernetes to run the Data Stream Processor
- Installs the Data Stream Processor
- Checks that the Data Stream Processor is ready for use
- The application status hook will be invoked - failures will be tagged as "degraded" in
gravity status
.
- The application status hook will be invoked - failures will be tagged as "degraded" in
Configure the Data Stream Processor UI redirect URL
By default, the Data Stream Processor uses the IPv4 address of eth0 to derive several properties required by the UI to function properly. This will work in many but not all cases.
In the case that the eth0 network is not directly accessible (for example, it exists inside a private AWS VPC) or is otherwise incorrect, use the configure-ui
script to manually define the IP or hostname that can be used to access DSP.
- From the master node, enter the following:
DSP_HOST=<ip-address-of-master-node> ./configure-ui
- Then, enter the following:
./deploy
- Navigate to the Data Stream Processor UI to verify your changes.
https://<DSP_HOST>:30000/
- On the login page, enter the following:
User: dsp-admin Password: <the dsp-admin password generated from the installer>
If you are using the Google Chrome browser and encounter a "net::ERR_CERT_INVALID" error with no "Proceed Anyway" option when you click on Advanced, click anywhere on the background then type "thisisunsafe" to trust the certificate.
- (Optional) If you need to retrieve the dsp-admin password, enter the following on your master node:
./print-login
Check the status of your Data Stream Processor deployment
To check the status of your cluster, do the following:
- From a node, type the following.
gravity status
A response showing the current health of your cluster is displayed.
Cluster status: active Application: dspbeta, version 0.1.5-503873 Join token: cb8155ed37115fe4f70cd896e4a0eea5 Periodic updates: Not Configured Remote support: Not Configured Last completed operation: * operation_install (5b6cfae2-ee66-4789-a7e9-9d257d99cea9) started: Fri Sep 27 16:02 UTC (3 days ago) completed: Fri Sep 27 16:02 UTC (3 days ago) Cluster endpoints: * Authentication gateway: - 172.31.24.68:32009 * Cluster management URL: - https://172.31.24.68:32009 Cluster nodes: musingbanach888 Masters: * ip-172-31-24-68 / 172.31.24.68 / worker Status: healthy
Cluster configuration options
You can view or change the default configurations of your cluster by using the following commands.
- get-config
- set-config
- list-configs
- get-secret
- set-secret
- list-secrets
To set a new configuration or secret:
- To set a new configuration, type the following from a node.
./set-config <PROPERTY_NAME> <VALUE>
- To set a new secret, type the following from a node.
./set-secret <SECRET_NAME>
- Deploy the changes.
./deploy
To view existing cluster configurations:
- From a node, type the following to see the list of configurations, including values.
./list-configs
- From a node, type the following to see the list of secret keys.
./list-secrets
To view individual configurations or secrets:
- From a node, type the following to see an individual non-secret configuration property.
./get-config <PROPERTY_NAME>
- From a node, type the following to see an individual secret.
./get-secret <SECRET_NAME>
Hardware and Software Requirements | Port configuration requirements |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.0
Feedback submitted, thanks!