Install the Splunk Data Stream Processor
To install the Splunk Data Stream Processor (DSP), download, extract, and run the installer on each node. You must contact your Splunk representative to access the Splunk Data Stream Processor download page. The Splunk Data Stream Processor is installed from a Gravity package, which builds a Kubernetes cluster that DSP is installed and deployed onto. See the Gravity documentation for more information.
At a glance, the DSP installer does the following things:
- Checks that the system is running on a supported OS.
- Checks that the system passes pre-installation checks such as meeting the minimum system requirements.
- Checks that the system is not already running Docker or other conflicting software.
- Checks that the system has the necessary running services and kernel modules.
- Installs Docker, Kubernetes, and other software dependencies like SCloud. For more information about SCloud, see Get started with SCloud.
- Prepares Kubernetes to run the Splunk Data Stream Processor.
- Installs the Splunk Data Stream Processor.
- Checks that the Splunk Data Stream Processor is ready for use.
See What's in the installer directory? for information about the files and scripts that the installer tarball contains.
Extract and run the Splunk Data Stream Processor installer
Prerequisites
Prerequisite | Description |
---|---|
Your system meets the minimum hardware and software requirements for DSP. | See Hardware and Software requirements. |
The required ports are open. | See Port configuration requirements. |
The download link for the Splunk Data Stream Processor. | Contact Splunk Support. |
Disable SELinux. | To disable SELinux, run setenforce 0 on your nodes.
|
Synchronize your system clocks. | Consult the system documentation for the particular operating system on which you are running the Splunk Data Stream Processor. For most environments, Network Time Protocol (NTP) is the best approach. |
You have system administrator (root) permissions. | You need administrator (root) permissions so Kubernetes can leverage system components like iptables and kernel modules. If you do not have root permissions, you can use the sudo command.
|
Steps
- Download the Splunk Data Stream Processor installer tarball on each node.
- On each node in your cluster, extract the Splunk Data Stream Processor installer from the tarball. In order for the DSP installation to complete, you must have the right number of nodes ready to join to form a cluster. The number of nodes depends on the installation flavor you select in step 4.
tar xf <dsp-version>-linux-amd64.tar
The DSP installer times out after 5 minutes, so if you don't have these nodes prepared, you need to start the installation process over again.
Do not untar and later run the installer from the
/tmp
folder. - On one of the nodes that you want to be a master node, navigate to the extracted file.
cd <dsp-version>
- Determine which flavor of DSP you want to install. Each install flavor comes with a different set of profiles that you can assign to your nodes. The number in the install flavor name corresponds to the minimum number of master or control plane nodes that the flavor supports. Installation flavors are fixed at installation-time, therefore, select the flavor that accommodates the largest implementation that you think you'll need.
Flavor Available number of master nodes Minimum cluster size Recommended cluster size Notes ha3 3 3 5-14 Recommended for small-sized deployments. With this flavor, there is no node separation between control and data planes, and therefore, the only profiles available for this flavor are master or worker. hacp5 5 10 15-50 Recommended for medium-sized deployments. When it comes to operating a larger DSP deployment, you should consider how to manage the communication that needs to happen between the components that comprise the deployment. This flavor includes node separation for control and data planes to better manage this communication. See Control plane and data plane node profiles for more information. hacp9 9 18 50+ Recommended for large-sized deployments. When it comes to operating a larger DSP deployment, you should consider how to manage the communication that needs to happen between the components that comprise the deployment. This flavor includes node separation for control and data planes to better manage this communication. See Control plane and data plane node profiles for more information. - From the extracted file directory, run the DSP install command. You must run this command with the
--flavor=<flavor>
flag, but theinstall
command supports several optional flags as well. For a list of all available optional flags, see the Install flags section../install [--optional-flags] --flavor=<flavor>
The order of optional flags matter. The
--accept-license
and the--location
flags must be specified first, followed by any other desired flags. For example:./install --accept-license --location=/opt/splunk/dsp [--other-optional-flags]
After the initial node has connected to the installer, the installer outputs a join command that you need to run on the other nodes in your cluster. Continue to the next section for instructions.
Join nodes to form the cluster and finish the installation
After running the Splunk Data Stream Processor installer, you must join the nodes together to form the cluster. Follow the instructions for your chosen installation flavor.
Install DSP using the ha3 flavor
- After initiating the installation process, the installer prints out a command that you must use to join the other nodes to the first master node. The node number corresponds to the minimum number of nodes that must have this role in the cluster. Copy the text after
gravity join
.Wed Oct 2 23:59:56 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- master 2 gravity join <ip-address-master> --token=<token> --role=master
- From the working directory of the other nodes that you want to join the cluster, enter one of the following commands.
- To join this node to form the cluster, use this command:
./join <ip-address-of-master> --token=<token> --role=master
- To join this node to the cluster and change the location where Gravity stores container and state information, use one of the following commands. By default, Gravity uses
/var/lib/gravity
to store state information and mounts persistent volumes for containers to/var/data
. If you do not have enough disk space in/var
to support 24 hours of data retention, use this command to override the default path used for storage.- If you used
--location
in the installation step, enter this:./join <ip-address-of-master> --location=<path> --token=<token> --role=master
- If you used
--mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity
in the installation step, enter this:./join <ip-address-of-master> --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity --token=<token> --role=master
- If you used
- To join this node to form the cluster, use this command:
- When you have the required three master nodes to form a cluster, the installation continues. The installation process might take up to 45 minutes. Keep this terminal window open. The following shows the output that is displayed when the installation continues.
Fri May 22 14:28:24 UTC Connecting to installer Fri May 22 14:28:28 UTC Connected to installer Fri May 22 14:28:28 UTC Successfully added "master" node on 10.202.6.81 Fri May 22 14:28:28 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- master 2 ./gravity join <ip-address-of-master> --token=<token> --role=master Fri May 22 14:28:29 UTC Operation has been created Fri May 22 14:28:57 UTC Successfully added "master" node on 10.202.2.195 Fri May 22 14:28:57 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- master 1 ./gravity join <ip-address-of-master> --token=<token> --role=master Fri May 22 14:29:01 UTC Successfully added "master" node on 10.202.4.222 Fri May 22 14:29:01 UTC All agents have connected! .....
- (Optional) To add additional nodes as workers to your cluster after the installation is complete, enter the following command from the working directory of the node you want to add:
./join <ip-address-master> --token=abc –role=worker
While you can add additional master nodes with this command beyond the requirements of the flavor you selected, adding master nodes does not improve the high availability guarantees of the cluster, because services are still fixed to the original number of master nodes.
After these steps, Gravity continues with the installation. Once the installation has finished, Gravity outputs the login credentials to access the DSP UI as well as information about what services are now available as shown here:
Cluster is active To log into DSP: Hostname: https://localhost:30000 Username: dsp-admin Password: bf2be8066757ffc8 NOTE: this is the original password created during cluster bootstrapping, and will not be updated if dsp-admin's password is changed To see these login instructions again: please run ./print-login The following services are installed: SERVICE IP:PORT DSP UI localhost:30000 Login UI localhost:30002 S2S Forwarder localhost:30001 API Gateway localhost:31000 * Please make sure your firewall ports are open for these services * To see these services again: please run ./print-services
Install DSP using the hacp5 flavor
- After initiating the installation process, the installer prints out a command that you must use to join the other nodes to the first master node. The node number corresponds to the minimum number of nodes that must have this role in the cluster. Copy the text after
gravity join
.Sun Oct 11 19:39:13 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- controlplane 4 ./gravity join <ip-address-of-master> --token=b679ee083bec157d4cac5444390c0901 --role=controlplane dataplane 5 ./gravity join <ip-address-of-master> --token=b679ee083bec157d4cac5444390c0901 --role=dataplane
- From the working directory of the other nodes that you want to join the cluster with the controlplane role, enter one of the following commands.
- Join this node to the cluster.
./join <ip-address-master> --token=<token> --role=controlplane
- To join this node to the cluster and change the location where Gravity stores container and state information, use one of the following commands. By default, Gravity uses
/var/lib/gravity
to store state information and mounts persistent volumes for containers to/var/data
. If you do not have enough disk space in/var
to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage.- If you used
--location
in the installation step, enter this:./join <ip-address-of-master> --location=<path> --token=<token> --role=controlplane
- If you used
--mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity
in the installation step, enter this:./join <ip-address-of-master> --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity --token=<token> --role=controlplane
- If you used
- Join this node to the cluster.
- From the working directory of the other nodes that you want to join the cluster with the dataplane role, enter one of the following commands.
- Join this node to the cluster.
./join <ip-address-master> --token=<token> --role=dataplane
- Join this node to the cluster and change the location where Gravity stores container and state information. By default, Gravity uses
/var/lib/gravity
to store state information and mounts persistent volumes for containers to/var/data
. If you do not have enough disk space in/var
to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage.- If you used
--location
in the installation step:./join <ip-address-of-master> --location=<path> --token=<token> --role=dataplane
- If you used
--mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity
in the installation step:./join <ip-address-of-master> --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity --token=<token> --role=dataplane
- If you used
- Join this node to the cluster.
- When you have the required control plane and data plane nodes in your cluster, the install continues. The installation process may take up to 45 minutes. Keep this terminal window open.
Mon Oct 12 02:14:53 UTC Connecting to installer Mon Oct 12 02:15:00 UTC Connected to installer Mon Oct 12 02:15:01 UTC Successfully added "controlplane" node on 10.202.0.215 Mon Oct 12 02:15:01 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- controlplane 4 ./gravity join <ip-address-of-master> --token=<token> --role=controlplane dataplane 5 ./gravity join <ip-address-of-master> --token=<token> --role=dataplane ... ... ... Mon Oct 12 02:17:19 UTC Successfully added "dataplane" node on 10.202.3.205 Mon Oct 12 02:17:19 UTC All agents have connected!
- (Optional) To add additional nodes as data planes to your cluster after the install is complete, enter the following command from the working directory of the node you want to add.
./join <ip-address-master> --token=abc --role=dataplane
While you can technically add additional control plane nodes beyond a flavor's requirements with this command, simply adding control plane nodes does not improve the high availability guarantees of the cluster, because services will still be fixed to the original number of control plane nodes.
Once the install has finished, Gravity will output the login credentials to access the DSP UI as well as information about what services are now available.
To log into DSP: Hostname: https://localhost:30000 Username: dsp-admin Password: 5a5cbb86eb711d06 NOTE: this is the original password created during cluster bootstrapping, and will not be updated if dsp-admin's password is changed To see these login instructions again: please run ./print-login The following services are installed: SERVICE IP:PORT DSP UI localhost:30000 Login UI localhost:30002 S2S Forwarder localhost:30001 API Gateway localhost:31000 * Please make sure your firewall ports are open for these services * To see these services again: please run ./print-services
Install DSP using the hacp9 flavor
- After initiating the installation process, the installer prints out a command that you must use to join the other nodes to the first master node. The node number corresponds to the minimum number of nodes that must have this role in the cluster. Copy the text after
gravity join
.Sun Oct 11 19:39:13 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- controlplane 8 ./gravity join <ip-address-of-master> --token=<token> --role=controlplane dataplane 9 ./gravity join <ip-address-of-master> --token=<token> --role=dataplane
- From the working directory of the other nodes that you want to join the cluster, enter one of the following commands.
- Join this node to the cluster.
./join <ip-address-of-master> --token=<token> --role=controlplane
- To join this node to the cluster and change the location where Gravity stores container and state information, use one of the following commands. By default, Gravity uses
/var/lib/gravity
to store state information and mounts persistent volumes for containers to/var/data
. If you do not have enough disk space in/var
to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage.- If you used
--location
in the installation step, enter this:./join <ip-address-of-master> --location=<path> --token=<token> --role=controlplane
- If you used
--mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity
in the installation step:./join <ip-address-of-master> --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity --token=<token> --role=controlplane
- If you used
- Join this node to the cluster.
- From the working directory of the other nodes that you want to join the cluster with the dataplane role, enter one of the following commands.
- Join this node to the cluster.
./join <ip-address-master> --token=<token> --role=dataplane
- Join this node to the cluster and change the location where Gravity stores container and state information. By default, Gravity uses
/var/lib/gravity
to store state information and mounts persistent volumes for containers to/var/data
. If you do not have enough disk space in/var
to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage.- If you used
--location
in the installation step, enter this:./join <ip-address-of-master> --location=<path> --token=<token> --role=dataplane
- If you used
--mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity
in the installation step, enter this:./join <ip-address-of-master> --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity --token=<token> --role=dataplane
- If you used
- Join this node to the cluster.
- When you have the required control plane and data plane nodes in your cluster, the install continues. The installation process may take up to 45 minutes. Keep this terminal window open.
Mon Oct 12 02:14:53 UTC Connecting to installer Mon Oct 12 02:15:00 UTC Connected to installer Mon Oct 12 02:15:01 UTC Successfully added "controlplane" node on 10.202.0.215 Mon Oct 12 02:15:01 UTC Please execute the following join commands on target nodes: Role Nodes Command ---- ----- ------- controlplane 8 ./gravity join <ip-address-of-master> --token=<token> --role=controlplane dataplane 9 ./gravity join <ip-address-of-master> --token=<token> --role=dataplane ... ... ... Mon Oct 12 02:17:19 UTC Successfully added "dataplane" node on 10.202.4.220 Mon Oct 12 02:17:19 UTC All agents have connected!
- (Optional) To add additional nodes as data planes to your cluster after the install is complete, enter the following command from the working directory of the node you want to add.
./join <ip-address-master> --token=abc –role=dataplane
While you can technically add additional control plane nodes beyond a flavor's requirements with this command, simply adding control plane nodes does not improve the high availability guarantees of the cluster, because services will still be fixed to the original number of control plane nodes.
At this point, Gravity continues with the install. Once the install has finished, Gravity will output the login credentials to access the DSP UI as well as information about what services are now available.
Cluster is active To log into DSP: Hostname: https://localhost:30000 Username: dsp-admin Password: bf2be8066757ffc8 NOTE: this is the original password created during cluster bootstrapping, and will not be updated if dsp-admin's password is changed To see these login instructions again: please run ./print-login The following services are installed: SERVICE IP:PORT DSP UI localhost:30000 Login UI localhost:30002 S2S Forwarder localhost:30001 API Gateway localhost:31000 * Please make sure your firewall ports are open for these services * To see these services again: please run ./print-services
Configure the Splunk Data Stream Processor UI redirect URL
By default, the Splunk Data Stream Processor uses the IPv4 address of eth0 to derive several properties required by the UI to function properly. In the case that the eth0 network is not directly accessible (for example, it exists inside a private AWS VPC) or is otherwise incorrect, use the configure-ui
script to manually define the IP address or host name that can be used to access DSP.
- From the master node, enter the following:
DSP_HOST=<ip-address-of-master-node> ./configure-ui
- Next, enter the following:
./deploy
- Navigate to the Splunk Data Stream Processor UI to verify your changes.
https://<DSP_HOST>:30000/
- (Optional) DSP exposes four external network ports: 30000 for the DSP UI, 30002 for Authentication and Login, 31000 for the API Services, and 30001 for the Forwarders Service. By default, DSP uses self-signed certificates to connect to these services. To use your own SSL/TLS certificate to connect to these services, see Secure the DSP cluster with SSL/TLS certificates.
- On the login page, enter the following:
User: dsp-admin Password: <the dsp-admin password generated from the installer>
If you are using the Firefox or Microsoft Edge browsers, you must trust the API certificate separately. Navigate to the host of your DSP instance at port 31000, for example, https://1.2.3.4:31000 and trust the self-signed certificate.
If you are using the Google Chrome browser and encounter a "net::ERR_CERT_INVALID" error with no Proceed Anyway option when you click Advanced, click anywhere on the background then type "thisisunsafe" to trust the certificate.
- (Optional) If you need to retrieve the dsp-admin password, enter the following on your master node:
./print-login
Check the status of your Splunk Data Stream Processor deployment
To check the status of your cluster, run the following command from the working directory of a node.
gravity status
A response showing the current health of your cluster is displayed.
$ gravity status Cluster name: busyalbattani4291 Cluster status: active Application: dsp, version 1.2.0-rc.20201011.1495613 Gravity version: 6.1.41 (client) / 6.1.41 (server) Join token: 2a3c8b65df0084babde29787a292be80 Periodic updates: Not Configured Remote support: Not Configured Last completed operation: * 10-node install ID: c6c5a7a5-35ff-47b7-9d66-dab55b731c18 Started: Mon Oct 12 02:15 UTC (46 minutes ago) Completed: Mon Oct 12 02:17 UTC (43 minutes ago) Cluster endpoints: * Authentication gateway: - 10.202.0.215:32009 - 10.202.6.118:32009 - 10.202.0.86:32009 * Cluster management URL: - https://10.202.0.215:32009 - https://10.202.6.118:32009 - https://10.202.0.86:32009 Cluster nodes: Masters: * ip-10-202-0-215 / 10.202.0.215 / controlplane Status: healthy Remote access: online * ip-10-202-6-118 / 10.202.6.118 / controlplane Status: healthy Remote access: online * ip-10-202-0-86 / 10.202.0.86 / controlplane Status: healthy Remote access: online * ip-10-202-6-200 / 10.202.6.115 / controlplane Status: healthy Remote access: online * ip-10-202-0-91 / 10.202.0.110 / controlplane Status: healthy Remote access: online Nodes: * ip-10-202-2-129 / 10.202.2.129 / dataplane Status: healthy Remote access: online * ip-10-202-0-194 / 10.202.0.194 / dataplane Status: healthy Remote access: online * ip-10-202-3-205 / 10.202.3.205 / dataplane Status: healthy Remote access: online * ip-10-202-2-195 / 10.202.2.195 / dataplane Status: healthy Remote access: online * ip-10-202-4-222 / 10.202.4.222 / dataplane Status: healthy Remote access: online
Reference
Control plane and data plane node profiles
For smaller deployments, the ha3 flavor schedules various components of the DSP cluster to a single group of nodes. For clusters that have less traffic, this simpler choice is sufficient.
To scale to medium to large deployments that are capable of ingesting and processing large amounts of data, worker counts and worker resources of pods associated with data ingestion and processing must be increased. When these workers start to consume large amounts of node resources, for example, CPU/RAM/bandwidth resources, it can negatively impact other services that are sharing those resources on the same node. This is commonly known as the "noisy neighbor problem".
In order to alleviate this noisy neighbor problem, DSP provides the hacp5 and hacp9 deployment flavors. These flavors separate DSP services into two planes or groups of nodes:
- Control plane: The control plane contains the components that handle the coordination and logistics of the DSP deployment. An example of a control plane service is the service that activates a pipeline when requested by the user.
- Data plane: The data plane contains components that handle ingest and processing of data in the DSP deployment. These services see very high traffic and consume lots of resources. While the majority of compute and resource consumption are in the data plane, because DSP has robust data delivery guarantees, the DSP cluster is relatively tolerant of data plane nodes. See data retention policies for more information about the Splunk Data Stream Processor data delivery guarantees.
When you select either the hacp5 or hacp9 installation flavors, you join nodes as either the control plane profile or the data plane profile, resulting in higher performance and reliability of DSP components.
Install flags
The following table lists the flags you can use with the install command and a description of how to use them:
Flag | Description |
---|---|
--accept-license | Automatically accepts the license agreement that is printed upon completion of the installation.
|
--location <path> | Changes the location where Gravity stores containers and state information. The --location flag mounts persistent volumes in data:<path>/data and stores state information in <path>/gravity . This flag is equivalent to using the --mount=data:<path>/data and --state-dir=<path>/gravity flags.
If you use the The By default, Gravity mounts persistent volumes for containers to If you use |
--cluster <cluster_name> | Gives the DSP cluster a name. If you do not specify a cluster name, DSP automatically generates one for you. This is used by the Splunk App for DSP. |
--token <token> | A secure token that prevents rogue nodes from joining the cluster. Your token must be at least six characters long. |
--service-uid <numeric> | Specifies the Service User ID. For information about how this is used, see Service User in the Gravity documentation. If not specified, a user named planet is created with user id 1000 . The |
--service-gid <numeric> | Specifies the Service Group ID. For information about how this is used, see Service User in the Gravity documentation. If not specified, a group named planet is created.
The |
--pod-network-cidr <10.244.0.0/16> | The CIDR range Kubernetes allocates node subnets and pod IPs from. Must be a minimum of /16 so Kubernetes can allocate /24 to each node. If not specified, defaults to 10.244.0.0/16 .
|
--service-cidr <10.100.0.0/16> | The CIDR range Kubernetes allocates service IPs from. If not specified, defaults to 10.100.0.0/16 .
|
--mount=data:<mount-path> --state-dir=<state-path> | Changes the location where Gravity stores containers and state information. Replace <mount-path> and <state-path> with the paths that you'd like Gravity to use for storage. For example, if you wanted to mount persistent volumes in <mount-path>/bar/baz but store state information in <state-path>/quux , then run: ./install --mount=data:<mount-path>/bar/baz --state-dir=<state-path>/quux .
Note that If you use the By default, Gravity mounts persistent volumes for containers to
|
Additional installation considerations | Upgrade the Splunk Data Stream Processor to 1.2.4 |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!