Splunk® Data Stream Processor

Install and administer the Data Stream Processor

Acrobat logo Download manual as PDF


On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.
Acrobat logo Download topic as PDF

Install the Data Stream Processor

To install the Splunk Data Stream Processor, download, extract, and run the installer on each node in your cluster. You must contact your Splunk representative to access the Data Stream Processor download page. The Data Stream Processor is installed from a Gravity package, which builds a Kubernetes cluster that DSP is eventually installed and deployed onto. See the Gravity documentation for more information.

At a glance, the DSP Installer does the following things:

  • Checks that the system is running on a supported OS.
  • Checks that the system passes pre-installation checks such as meeting the minimum system requirements.
  • Checks that the system is not already running docker or other conflicting software.
  • Checks that the system has the necessary running services and kernel modules.
  • Installs Docker, Kubernetes, and other software dependencies like SCloud.
  • Prepares Kubernetes to run the Data Stream Processor.
  • Installs the Data Stream Processor.
  • Checks that the Data Stream Processor is ready for use.

Extract and run the Data Stream Processor installer

Do the following steps to extract and run the DSP installer.

Prerequisites

  • Your system meets the minimum Hardware and Software requirements for DSP. See Hardware and Software requirements.
  • You do not have FIPS mode enabled on your operating system.
  • You have the required ports open. See Port configuration requirements.
  • If you are installing on RHEL or Centos, you must temporarily disable SELinux on all of your nodes: setenforce 0
  • You have system administrator (root) permissions. You need administrator (root) permissions so Kubernetes can leverage system components like iptables and kernel modules. If you do not have root permissions, you can use the sudo command.
  • Make sure that your system clocks are synchronized on each node. Consult the system documentation for the particular operating systems on which you are running the Splunk Data Stream Processor. For most environments, Network Time Protocol (NTP) is the best approach.
  • Depending on the configurations of your environment, you may need to do additional prerequisites before installing DSP. Talk to your administrator and see if any of the steps listed in the additional installation considerations apply to you.

Steps

  1. Download the Data Stream Processor installer on each node in your cluster.
  2. On each node in your cluster, extract the Data Stream Processor installer from the tarball.
    tar xf <dsp-version>-linux-amd64.tar

    In order for the DSP installer to complete, you must have at least 3 nodes ready to join the cluster. The DSP installer times out after 5 minutes, and if you do not have these nodes prepared, you may need to start the installation process over again.

  3. On the node that you want to be the master node, navigate to the extracted file.
    cd <dsp-version>
  4. From the extracted file directory, run the DSP installer command. You must run this command with the --flavor=ha flag, but the install command supports several optional flags as well:
    ./install [--optional-flags] --flavor=ha 
    
    Flag Description
    --accept-license Automatically accepts the license agreement printed upon completion.
    --token <token> A secure token preventing rogue nodes from joining the cluster during installation. Your token must be at least six characters long.
    --service-uid <numeric> Specifies the Service User ID. For information about how this is used, see Service User in the Gravity documentation. If not specified, a user named planet is created with user id 1000.
    Note: The ./join command does not support the --service-uid or --service-gid flags, but instead, the worker nodes use whatever value is set on the master node with ./install.
    --service-gid <numeric> Specifies the Service Group ID. For information about how this is used, see Service User in the Gravity documentation. If not specified, a group named planet is created.
    Note: The ./join command does not support the --service-uid or --service-gid flags, but instead, the worker nodes use whatever value is set on the master node with ./install.
    --pod-network-cidr <10.244.0.0/16> The CIDR range Kubernetes will be allocating node subnets and pod IPs from. Must be a minimum of /16 so Kubernetes is able to allocate /24 to each node. If not specified, defaults to 10.244.0.0/16.
    --service-cidr <10.100.0.0/16> The CIDR range Kubernetes will be allocating service IPs from. If not specified, defaults to 10.100.0.0/16.
    --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity Change the location where Gravity stores containers and state information. Use this flag instead of the --location flag if you want to use different directories to store mount and state information. By default, Gravity uses /var/lib/gravity for storage. If you do not have enough disk space in /var to support 24 hours of data retention, then use this command to override the default path used for storage.

    If you use the --mount and --state-dir flags to change the location where Gravity stores containers and state information, you must use the flags both when installing and when joining the nodes.

    • Replace <mount-path> with the path that you'd like Gravity to use for storage. For example, if you wanted to install everything in /opt/splunk/dsp then you would run: ./install --mount=data:/opt/splunk/dsp/gravity/pv --state-dir=/opt/splunk/dsp/gravity.
    • /gravity/pv is an example subdirectory to hold the data received from all sources.
    • /gravity is an example subdirectory to hold cluster state information.
  5. Once the initial node has connected to the installer, a join command is outputted that you will need to run on the other nodes in your cluster. Continue to the next section for steps.

Join nodes to the cluster to finish install

You must now join the nodes together to form the cluster.

  1. After some period of time, the installer prints out a command that you must use to join the other nodes to the first master node. Copy the text after gravity join.
    Wed Oct  2 23:59:56 UTC Please execute the following join commands on target nodes:
    Role    Nodes   Command
    ----    -----   -------
    worker  2       gravity join <ip-address-master> --token=<token> --role=worker
    
  2. From the working directory of the other nodes that you want to join the cluster, enter one of the following commands.
    • Join this node to the cluster.
      ./join <ip-address-of-master> --token=<token> --role=worker
    • Join this node to the cluster and change the location where Gravity stores container and state information. By default, Gravity uses /var/lib/gravity to store state information and mounts persistent volumes for containers to /var/data. If you do not have enough disk space in /var to support 24 hours of data retention, then use this command to override the default path used for storage.
      ./join <ip-address-of-master> --token=<token> --role=worker --mount=data:/<mount-path>/gravity/pv --state-dir=/<mount-path>/gravity
  3. When you have a minimum of three nodes in your cluster, the install continues. The installation process may take up to 45 minutes. Keep this terminal window open.
    Fri May 22 14:28:24 UTC	Connecting to installer
    Fri May 22 14:28:28 UTC	Connected to installer
    Fri May 22 14:28:28 UTC	Successfully added "worker" node on 10.202.6.81
    Fri May 22 14:28:28 UTC	Please execute the following join commands on target nodes:
    Role	Nodes	Command
    ----	-----	-------
    worker	2	./gravity join 10.202.6.81 --token=fbb30e2cb9e015bab9e58b27420fcdf8 --role=worker
    
    Fri May 22 14:28:29 UTC	Operation has been created
    Fri May 22 14:28:57 UTC	Successfully added "worker" node on 10.202.2.195
    Fri May 22 14:28:57 UTC	Please execute the following join commands on target nodes:
    Role	Nodes	Command
    ----	-----	-------
    worker	1	./gravity join 10.202.6.81 --token=fbb30e2cb9e015bab9e58b27420fcdf8 --role=worker
    
    Fri May 22 14:29:01 UTC	Successfully added "worker" node on 10.202.4.222
    Fri May 22 14:29:01 UTC	All agents have connected!
    .....
    

At this point, Gravity continues with the install. Once the install has finished, Gravity will output the login credentials to access the DSP UI as well as information about what services are now available.

Cluster is active

To log into DSP:

Hostname: https://localhost:30000
Username: dsp-admin
Password: bf2be8066757ffc8

NOTE: this is the original password created during cluster bootstrapping,
and will not be updated if dsp-admin's password is changed

To see these login instructions again: please run ./print-login

The following services are installed:

SERVICE         IP:PORT
DSP UI          localhost:30000
Login UI        localhost:30002
S2S Forwarder   localhost:30001
API Gateway     localhost:31000

 * Please make sure your firewall ports are open for these services *

To see these services again: please run ./print-services

Configure the Data Stream Processor UI redirect URL

By default, the Data Stream Processor uses the IPv4 address of eth0 to derive several properties required by the UI to function properly. In the case that the eth0 network is not directly accessible (for example, it exists inside a private AWS VPC) or is otherwise incorrect, use the configure-ui script to manually define the IP or hostname that can be used to access DSP.

  1. From the master node, enter the following:
    DSP_HOST=<ip-address-of-master-node> ./configure-ui
  2. Then, enter the following:
    ./deploy 
  3. (Optional) DSP exposes four external network ports: 30000 for the DSP UI, 30002 for Authentication and Login, 31000 for the API Services, and 30001 for the Forwarders Service. By default, DSP uses self-signed certificates to connect to these services. To use your own SSL/TLS certificate to connect to these services, see Secure DSP with SSL/TLS certificates.
  4. Navigate to the Data Stream Processor UI to verify your changes.
    https://<DSP_HOST>:30000/
  5. On the login page, enter the following:
    User: dsp-admin
    Password: <the dsp-admin password generated from the installer>

    If you are using the Firefox or MS Edge browsers, you must trust the API certificate separately. Navigate to the host of your DSP instance at port 31000. For example, navigate to "https://1.2.3.4:31000" and trust the self-signed certificate.

    If you are using the Google Chrome browser and encounter a "net::ERR_CERT_INVALID" error with no "Proceed Anyway" option when you click on Advanced, click anywhere on the background then type "thisisunsafe" to trust the certificate.

  6. (Optional) If you need to retrieve the dsp-admin password, enter the following on your master node:
    ./print-login

Change your admin password

Perform the following steps to change the dsp-admin password.

  1. From the master node, run the reset password script.
    sudo ./reset-admin-password
  2. Enter your new password.
  3. Navigate back to the DSP UI and login with your new password.

The print-login script only returns the original password generated by the installer. If you forget your changed admin password, you'll need to reset your password again.

Check the status of your Data Stream Processor deployment

To check the status of your cluster, type the following.

gravity status

A response showing the current health of your cluster is displayed.

$ sudo gravity status
Cluster name:		sadlumiere3129
Cluster status:		active
Application:		dsp, version 1.2.0-daily.20200518.1043150
Gravity version:	6.1.22 (client) / 6.1.22 (server)
Join token:		fbb30e2cb9e015bab9e58b27420fcdf8
Periodic updates:	Not Configured
Remote support:		Not Configured
Last completed operation:
    * 3-node install
      ID:		36003096-505f-420d-ad1b-efc561e0fca6
      Started:		Fri May 22 14:28 UTC (1 hour ago)
      Completed:	Fri May 22 14:29 UTC (1 hour ago)
Cluster endpoints:
    * Authentication gateway:
        - 10.202.6.81:32009
        - 10.202.2.195:32009
        - 10.202.4.222:32009
    * Cluster management URL:
        - https://10.202.6.81:32009
        - https://10.202.2.195:32009
        - https://10.202.4.222:32009
Cluster nodes:
    Masters:
        * ip-10-202-6-81 / 10.202.6.81 / worker
            Status:	healthy
        * ip-10-202-2-195 / 10.202.2.195 / worker
            Status:	healthy
        * ip-10-202-4-222 / 10.202.4.222 / worker
            Status:	healthy
Last modified on 27 October, 2020
PREVIOUS
Additional installation considerations
  NEXT
Upgrade the Data Stream Processor

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.1.0


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters