Splunk® Data Stream Processor

Install and administer the Data Stream Processor

On April 3, 2023, Splunk Data Stream Processor reached its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.

All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.

Install the Splunk Data Stream Processor on Google Kubernetes Engine

You can install the Splunk Data Stream Processor (DSP) on the Google Kubernetes Engine (GKE).

Prerequisites

See Preparing Google Cloud Platform to install the Splunk Data Stream Processor for instructions on configuring your Google Cloud Platform (GCP) to be compatible with DSP. Before you can install DSP on the Google Kubernetes Engine, you must complete the following prerequisites.

  1. Create a Linux instance
  2. Install Terraform
  3. Install the Google Cloud CLI tool
  4. Install the GKE kubectl plugin
  5. Retrieve GCP service account key
  6. Create and set up a Google Cloud Storage bucket

Create a Linux instance

First, create a Linux instance operating with Ubuntu 20.04 or 22.04 LTS and with 50GB of memory. This is where you will install your GKE environment. Search for "google cloud sdk installation" in the Google Cloud documentation for the installation packages and instructions.

Install Terraform

Search for "install terraform" in the HashiCorp developer documentation for the installation package and installation commands. DSP supports Terraform version 1.0.5.

Install the Google Cloud CLI tool

Search for "gcloud cli install" in the Google Cloud documentation for information.

Install the kubectl plugin

Search for "kubectl plugin installation" in the Google Cloud documentation for information.

Retrieve GCP service account key

Check with your GCP account owner to get the service account key from its JSON file.

  1. Copy the service account key file to the /opt directory.
    vim /opt/<service_account.json>
    <paste the service account file content>
  2. Export the Google credentials from the service account.
    export GOOGLE_APPLICATION_CREDENTIALS=/opt/<service_account.json>
    export USE_GKE_GCLOUD_AUTH_PLUGIN=True
  3. Activate the service account using the following syntax. You can retrieve the SERVICE_ACCOUNT@DOMAIN.COM and PROJECT_ID details from your /opt/<service_account.json> file.
    gcloud auth activate-service-account <SERVICE_ACCOUNT@DOMAIN.COM> --key-file=/<path>/key.json --project=<PROJECT_ID>

Create and set up a Google Cloud Storage bucket

Create a multi-region Google Cloud Storage bucket with a name in the format: <prefix>-<cluster name>-<suffix>. Search for "Creating storage buckets" in the Google Cloud documentation.

Ensure that your Google Cloud Storage bucket has the fine-grained object-level access control list permissions.

Installation

Installing DSP on GKE is divided into two parts:

  1. Cluster creation in GKE
  2. DSP installation on GKE cluster

GKE cluster creation

  1. Download the Splunk Data Stream Processor installer TAR file under your root directory and extract it.
    tar xf dsp-<version>-linux-amd64.tar
  2. Navigate to the extracted file's bin directory.
    cd <dsp-version>/bin
  3. Prepare the GKE yaml file with the required parameters based on your environment. See GKE Parameter Definitions for more information on each parameter.
  4. Run the following Spawn commands to create a new cluster.
    ./spawn cluster create <cluster_name> -f ../examples/<file name>.yaml 
    ./spawn cluster apply <cluster_name>

    Once cluster generation is complete, Spawn outputs the following:

    [I ]
    [I ] Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
    [I ]
  5. Navigate to your Google Kubernetes Engine service UI and verify that your new cluster is present.

DSP Installation on GKE cluster

  1. Update the following cluster information.
    export GOOGLE_APPLICATION_CREDENTIALS=/opt/<service_account>.json
    export CLUSTER_NAME=<cluster_name>
    export CLUSTER_ZONE_OR_REGION=<region_or_zone>
  2. Update the Google Cloud Storage checkpoint's default yaml file with the following configurations.

    For configurations requiring BUCKET_NAME, use the same bucket name from Create and set up a Google Cloud Storage bucket.

    Configuration Value
    google_credential_file_encoded <encoded credentials>

    This value is the output from running Base64 -i -w0 < /opt/service-account.json

    state_checkpoints_dir gs://<BUCKET-NAME>/flink/gcp-checking
    high_availability_storagedir gs://<BUCKET-NAME>/flink
    flink_state_savepoint_base_uri gs://<BUCKET-NAME>/flink/savepoint
    plugin_storage gcs
    plugin_s3bucket <BUCKET-NAME>
    gcs_project_number <project number>
    gcs_project_number_encoded <encoded project number>

    This value is the output from running echo -n '' | base64

    cloud_provider GKE
  3. Install the Splunk Data Stream Processor. See Install DSP with internal registry to install DSP on your internal registry instead of Google Container Registry. For a list of all available flags, see the install flags sections in Install the Splunk Data Stream Processor and Preparing Google Cloud Platform to install the Splunk Data Stream Processor.
    ./dsp install –cluster-provider=gke —-registryURL=gcr.io —-accept-license —-flavor=<flavor type> [--cluster-type=<cluster type>]

    After these steps, installation continues. Once installation completes, k0s outputs the login credentials to access the DSP UI as well as information about what services are now available as shown here:

    Finished installing DSP
    ...
    To log into DSP:
    Hostname: https://<localhost>
    Username: dsp-admin
    Password: <password>
    
    
    NOTE: this is the original password created during cluster bootstrapping, and will not be updated if dsp-admin's password is changed
    The following endpoints are available on the cluster:
    ENDPOINTS IP:PORT
    DSP UI <localhost>
    S2S Forwarder <localhost>:9997
    * Please make sure your firewall ports are open for these services *
    To see these login instructions again: please run sudo dsp admin print-login
    

Reference

(Optional) Install Install DSP with internal registry

If you want to install DSP with your internal registry instead of Google Container Registry, complete the following steps. Ensure that you complete all steps in Prerequisites, GKE cluster creation, and steps 1-3 of DSP Installation on GKE cluster.

  1. Run the following commands to load the two required images into your internal registry.

    Ensure that you set proper permissions to your internal registry so that it can communicate with your GKE cluster and pull these images.

    cd <dsp-folder>/k0s/init/airgap/generic
    docker load -i nginx-1.22.0-alpine-11.tar 
    docker load -i registry-v2.8.1.tar
    docker tag nginx:1.22.0-alpine-11 <internal-registry>/path/nginx:1.22.0-alpine-11
    docker tag registry:v2.8.1 <internal-registry>/path/registry:v2.8.1
  2. Authenticate the two images with your internal registry.
    docker push <internal-registry>/path/nginx:1.22.0-alpine-11
    docker push <internal-registry>/path/registry:v2.8.1
  3. Set the following environment configurations
    export GOOGLE_APPLICATION_CREDENTIALS=/opt/service-account.json
    export CLUSTER_NAME=<cluster-name>
    export CLUSTER_ZONE_OR_REGION=<region or zone based on cluster type>
    export REGISTRY_IMAGE=<internal-registry>/path/registry:v2.8.1
    export NGINX_IMAGE=<internal-registry>/path/nginx:1.22.0-alpine-11
  4. Install DSP.
    ./dsp install --cluster-provider=gke --registryURL=internal --accept-license --debug

GKE Parameter Definitions

The following list describes the required and optional parameters available in GKE. These parameters are specific to a DSP installation on GKE.

Parameter Required/Optional for DSP Description
nodes Required The number of nodes to create in your cluster. In regional or multi-zone clusters, this is the number of nodes per zone.
username Required Username for the node, either ec2 or compute engine instance
keypair Required The name of the account used for provisioning resources., For GCP, it is the service account name.
region Required The region where you want to provision resources.
zone Required (if zonal cluster) The region's zone where you want to provision resources. Defaults to us-central1-a. If changing region, you must include a zone.
gke_private_cluster Required Specifies whether your cluster is public or private.
regional_cluster Required Specifies whether your cluster is regional or zonal.
authorized_ipv4_cidr_block Required Other networks CIDR that can access the Kubernetes cluster controller through HTTPS.
asg_min_node_count Required Minimum number of nodes per zone in the node pool. Must be greater than zero and less than max_node_count. Cannot be used with total limits.
asg_max_node_count Required Maximum number of nodes per zone in the node pool. Must be greater than min_node_count.. Cannot be used with total limits.
gke_nodepool_service_account Required Custom service accounts that have the cloud-platform scope and IAM role permissions for the GKE cluster worker nodes.
release_channel Required Configuration options for the release channel feature, which provides more control over automatic upgrades of your GKE clusters. Search for "release channel" in the Google Cloud documentation for more information about this feature.

When updating this field, GKE imposes specific version requirements. The google_container_engine_versions data source can provide the default version for a channel.

Removing the release_channel field from your configuration will cause Terraform to stop managing your cluster's release channel, but will not un-enroll it. Instead, use the default "UNSPECIFIED" channel.

auto_repair Required Specifies whether the node auto-repair function is turned on for the node pool. When turned on, it monitors the nodes in the node pool and triggers an automatic repair if the nodes fail health checks repeatedly. Default setting is true
auto_upgrade Required Specifies whether node auto-upgrade is turned on for the node pool. When turned on, it keeps the nodes in your node pool up to date with the latest release version of Kubernetes. Default value is false.
cmek_key Required The Customer Managed Encryption Key used to encrypt the boot disk attached to each node in the node pool. This should look like the following:

projects/[KEY_PROJECT_ID]/locations/[LOCATION]/keyRings/[RING_NAME]/cryptoKeys/[KEY_NAME]

maintenance_start_time Required Time window specified for daily maintenance operations. Specify start time as <HH>:<MM>.
volumeSizeGB Required Volume of the compute engine resource. Default value is 200GB
network Required Network to be used. For GCP, it is the VPC network name.
subnet Required Subnet to be used.
instanceType Required Type of instance.
project Required Name of project where the resources in your cluster will be created.
keyPath Optional The path of the ssh key for the nodes.
services_ipv4_cidr_block Optional The IP address range of the services IPs in your cluster. Leave blank to choose a default size range. Set to /<netmask> to have a range chosen with a specific netmask. Set to a CIDR notation from the RFC-1918 private networks to use a specific range.
pods_ipv4_cidr_block Optional The IP address range for the cluster pod IPs. Leave blank to choose a default size range. Set to /<netmask> to have a range chosen with a specific netmask. Set to a CIDR notation from the RFC-1918 private networks to use a specific range.
image_type Optional The image type to use for a node. Search for "node images" in the Google Cloud documentation. Default is Ubuntu with containerd.

Changing the image type will delete and recreate all nodes in the node pool.

Last modified on 04 May, 2023
Install the Splunk Data Stream Processor   Upgrade the Splunk Data Stream Processor to 1.4.4

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters