Install the Splunk Data Stream Processor on Google Kubernetes Engine

You can install the Splunk Data Stream Processor (DSP) on the Google Kubernetes Engine (GKE).

Prerequisites

See Preparing Google Cloud Platform to install the Splunk Data Stream Processor for instructions on configuring your Google Cloud Platform (GCP) to be compatible with DSP. Before you can install DSP on the Google Kubernetes Engine, you must complete the following prerequisites.

Create a Linux instance

First, create a Linux instance operating with Ubuntu 20.04 or 22.04 LTS and with 50GB of memory. This is where you will install your GKE environment. Search for "google cloud sdk installation" in the Google Cloud documentation for the installation packages and instructions.

Install Terraform

Search for "install terraform" in the HashiCorp developer documentation for the installation package and installation commands. DSP supports Terraform version 1.0.5.

Install the Google Cloud CLI tool

Search for "gcloud cli install" in the Google Cloud documentation for information.

Install the kubectl plugin

Search for "kubectl plugin installation" in the Google Cloud documentation for information.

Retrieve GCP service account key

Check with your GCP account owner to get the service account key from its JSON file.

Copy the service account key file to the /opt directory.

vim /opt/<service_account.json>
<paste the service account file content>

Export the Google credentials from the service account.

export GOOGLE_APPLICATION_CREDENTIALS=/opt/<service_account.json>
export USE_GKE_GCLOUD_AUTH_PLUGIN=True

Activate the service account using the following syntax. You can retrieve the SERVICE_ACCOUNT@DOMAIN.COM and PROJECT_ID details from your /opt/<service_account.json> file.
```
gcloud auth activate-service-account <SERVICE_ACCOUNT@DOMAIN.COM> --key-file=/<path>/key.json --project=<PROJECT_ID>
```

Create and set up a Google Cloud Storage bucket

Create a multi-region Google Cloud Storage bucket with a name in the format: <prefix>-<cluster name>-<suffix>. Search for "Creating storage buckets" in the Google Cloud documentation.

Ensure that your Google Cloud Storage bucket has the fine-grained object-level access control list permissions.

Installation

Installing DSP on GKE is divided into two parts:

GKE cluster creation

Download the Splunk Data Stream Processor installer TAR file under your root directory and extract it.
```
tar xf dsp-<version>-linux-amd64.tar
```
Navigate to the extracted file's bin directory.
```
cd <dsp-version>/bin
```
Prepare the GKE yaml file with the required parameters based on your environment. See GKE Parameter Definitions for more information on each parameter.

Run the following Spawn commands to create a new cluster.

./spawn cluster create <cluster_name> -f ../examples/<file name>.yaml 
./spawn cluster apply <cluster_name>

Once cluster generation is complete, Spawn outputs the following:

[I ]
[I ] Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
[I ]

Navigate to your Google Kubernetes Engine service UI and verify that your new cluster is present.

DSP Installation on GKE cluster

Update the following cluster information.

export GOOGLE_APPLICATION_CREDENTIALS=/opt/<service_account>.json
export CLUSTER_NAME=<cluster_name>
export CLUSTER_ZONE_OR_REGION=<region_or_zone>

Update the Google Cloud Storage checkpoint's default yaml file with the following configurations.

For configurations requiring BUCKET_NAME, use the same bucket name from Create and set up a Google Cloud Storage bucket.

Configuration	Value
`google_credential_file_encoded`	`<encoded credentials>` This value is the output from running `Base64 -i -w0 < /opt/service-account.json`
`state_checkpoints_dir`	`gs://<BUCKET-NAME>/flink/gcp-checking`
`high_availability_storagedir`	`gs://<BUCKET-NAME>/flink`
`flink_state_savepoint_base_uri`	`gs://<BUCKET-NAME>/flink/savepoint`
`plugin_storage`	`gcs`
`plugin_s3bucket`	`<BUCKET-NAME>`
`gcs_project_number`	`<project number>`
`gcs_project_number_encoded`	`<encoded project number>` This value is the output from running `echo -n '' \| base64`
`cloud_provider`	`GKE`

Install the Splunk Data Stream Processor. See Install DSP with internal registry to install DSP on your internal registry instead of Google Container Registry. For a list of all available flags, see the install flags sections in Install the Splunk Data Stream Processor and Preparing Google Cloud Platform to install the Splunk Data Stream Processor.

./dsp install –cluster-provider=gke —-registryURL=gcr.io —-accept-license —-flavor=<flavor type> [--cluster-type=<cluster type>]

After these steps, installation continues. Once installation completes, k0s outputs the login credentials to access the DSP UI as well as information about what services are now available as shown here:

Finished installing DSP
...
To log into DSP:
Hostname: https://<localhost>
Username: dsp-admin
Password: <password>


NOTE: this is the original password created during cluster bootstrapping, and will not be updated if dsp-admin's password is changed
The following endpoints are available on the cluster:
ENDPOINTS IP:PORT
DSP UI <localhost>
S2S Forwarder <localhost>:9997
* Please make sure your firewall ports are open for these services *
To see these login instructions again: please run sudo dsp admin print-login

Reference

(Optional) Install Install DSP with internal registry

If you want to install DSP with your internal registry instead of Google Container Registry, complete the following steps. Ensure that you complete all steps in Prerequisites, GKE cluster creation, and steps 1-3 of DSP Installation on GKE cluster.

Run the following commands to load the two required images into your internal registry.

Ensure that you set proper permissions to your internal registry so that it can communicate with your GKE cluster and pull these images.

cd <dsp-folder>/k0s/init/airgap/generic
docker load -i nginx-1.22.0-alpine-11.tar 
docker load -i registry-v2.8.1.tar
docker tag nginx:1.22.0-alpine-11 <internal-registry>/path/nginx:1.22.0-alpine-11
docker tag registry:v2.8.1 <internal-registry>/path/registry:v2.8.1

Authenticate the two images with your internal registry.

docker push <internal-registry>/path/nginx:1.22.0-alpine-11
docker push <internal-registry>/path/registry:v2.8.1

Set the following environment configurations

export GOOGLE_APPLICATION_CREDENTIALS=/opt/service-account.json
export CLUSTER_NAME=<cluster-name>
export CLUSTER_ZONE_OR_REGION=<region or zone based on cluster type>
export REGISTRY_IMAGE=<internal-registry>/path/registry:v2.8.1
export NGINX_IMAGE=<internal-registry>/path/nginx:1.22.0-alpine-11

Install DSP.

./dsp install --cluster-provider=gke --registryURL=internal --accept-license --debug

GKE Parameter Definitions

The following list describes the required and optional parameters available in GKE. These parameters are specific to a DSP installation on GKE.

Parameter	Required/Optional for DSP	Description
`nodes`	Required	The number of nodes to create in your cluster. In regional or multi-zone clusters, this is the number of nodes per zone.
`username`	Required	Username for the node, either ec2 or compute engine instance
`keypair`	Required	The name of the account used for provisioning resources., For GCP, it is the service account name.
`region`	Required	The region where you want to provision resources.
`zone`	Required (if zonal cluster)	The region's zone where you want to provision resources. Defaults to `us-central1-a`. If changing `region`, you must include a `zone`.
`gke_private_cluster`	Required	Specifies whether your cluster is public or private.
`regional_cluster`	Required	Specifies whether your cluster is regional or zonal.
`authorized_ipv4_cidr_block`	Required	Other networks CIDR that can access the Kubernetes cluster controller through HTTPS.
`asg_min_node_count`	Required	Minimum number of nodes per zone in the node pool. Must be greater than zero and less than `max_node_count`. Cannot be used with total limits.
`asg_max_node_count`	Required	Maximum number of nodes per zone in the node pool. Must be greater than `min_node_count.`. Cannot be used with total limits.
`gke_nodepool_service_account`	Required	Custom service accounts that have the cloud-platform scope and IAM role permissions for the GKE cluster worker nodes.
`release_channel`	Required	Configuration options for the release channel feature, which provides more control over automatic upgrades of your GKE clusters. Search for "release channel" in the Google Cloud documentation for more information about this feature. When updating this field, GKE imposes specific version requirements. The `google_container_engine_versions` data source can provide the default version for a channel. Removing the `release_channel` field from your configuration will cause Terraform to stop managing your cluster's release channel, but will not un-enroll it. Instead, use the default `"UNSPECIFIED"` channel.
`auto_repair`	Required	Specifies whether the node auto-repair function is turned on for the node pool. When turned on, it monitors the nodes in the node pool and triggers an automatic repair if the nodes fail health checks repeatedly. Default setting is `true`
`auto_upgrade`	Required	Specifies whether node auto-upgrade is turned on for the node pool. When turned on, it keeps the nodes in your node pool up to date with the latest release version of Kubernetes. Default value is `false`.
`cmek_key`	Required	The Customer Managed Encryption Key used to encrypt the boot disk attached to each node in the node pool. This should look like the following: projects/[KEY_PROJECT_ID]/locations/[LOCATION]/keyRings/[RING_NAME]/cryptoKeys/[KEY_NAME]
`maintenance_start_time`	Required	Time window specified for daily maintenance operations. Specify start time as `<HH>:<MM>`.
`volumeSizeGB`	Required	Volume of the compute engine resource. Default value is `200GB`
`network`	Required	Network to be used. For GCP, it is the VPC network name.
`subnet`	Required	Subnet to be used.
`instanceType`	Required	Type of instance.
`project`	Required	Name of project where the resources in your cluster will be created.
`keyPath`	Optional	The path of the `ssh` key for the nodes.
`services_ipv4_cidr_block`	Optional	The IP address range of the services IPs in your cluster. Leave blank to choose a default size range. Set to `/<netmask>` to have a range chosen with a specific netmask. Set to a CIDR notation from the RFC-1918 private networks to use a specific range.
`pods_ipv4_cidr_block`	Optional	The IP address range for the cluster pod IPs. Leave blank to choose a default size range. Set to `/<netmask>` to have a range chosen with a specific netmask. Set to a CIDR notation from the RFC-1918 private networks to use a specific range.
`image_type`	Optional	The image type to use for a node. Search for "node images" in the Google Cloud documentation. Default is Ubuntu with containerd. Changing the image type will delete and recreate all nodes in the node pool.

Install the Splunk Data Stream Processor on Google Kubernetes Engine

Prerequisites

Create a Linux instance

Install Terraform

Install the Google Cloud CLI tool

Install the kubectl plugin

Retrieve GCP service account key

Create and set up a Google Cloud Storage bucket

Installation

GKE cluster creation

DSP Installation on GKE cluster

Reference

(Optional) Install Install DSP with internal registry

GKE Parameter Definitions

Comments

Install the Splunk Data Stream Processor on Google Kubernetes Engine

Was this topic useful?