All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.
Install the Splunk Data Stream Processor on Google Kubernetes Engine
You can install the Splunk Data Stream Processor (DSP) on the Google Kubernetes Engine (GKE).
Prerequisites
See Preparing Google Cloud Platform to install the Splunk Data Stream Processor for instructions on configuring your Google Cloud Platform (GCP) to be compatible with DSP. Before you can install DSP on the Google Kubernetes Engine, you must complete the following prerequisites.
- Create a Linux instance
- Install Terraform
- Install the Google Cloud CLI tool
- Install the GKE kubectl plugin
- Retrieve GCP service account key
- Create and set up a Google Cloud Storage bucket
Create a Linux instance
First, create a Linux instance operating with Ubuntu 20.04 or 22.04 LTS and with 50GB of memory. This is where you will install your GKE environment. Search for "google cloud sdk installation" in the Google Cloud documentation for the installation packages and instructions.
Install Terraform
Search for "install terraform" in the HashiCorp developer documentation for the installation package and installation commands. DSP supports Terraform version 1.0.5.
Install the Google Cloud CLI tool
Search for "gcloud cli install" in the Google Cloud documentation for information.
Install the kubectl plugin
Search for "kubectl plugin installation" in the Google Cloud documentation for information.
Retrieve GCP service account key
Check with your GCP account owner to get the service account key from its JSON file.
- Copy the service account key file to the /opt directory.
vim /opt/<service_account.json> <paste the service account file content>
- Export the Google credentials from the service account.
export GOOGLE_APPLICATION_CREDENTIALS=/opt/<service_account.json> export USE_GKE_GCLOUD_AUTH_PLUGIN=True
- Activate the service account using the following syntax. You can retrieve the
SERVICE_ACCOUNT@DOMAIN.COM
andPROJECT_ID
details from your/opt/<service_account.json>
file.gcloud auth activate-service-account <SERVICE_ACCOUNT@DOMAIN.COM> --key-file=/<path>/key.json --project=<PROJECT_ID>
Create and set up a Google Cloud Storage bucket
Create a multi-region Google Cloud Storage bucket with a name in the format: <prefix>-<cluster name>-<suffix>
. Search for "Creating storage buckets" in the Google Cloud documentation.
Ensure that your Google Cloud Storage bucket has the fine-grained object-level access control list permissions.
Installation
Installing DSP on GKE is divided into two parts:
GKE cluster creation
- Download the Splunk Data Stream Processor installer TAR file under your root directory and extract it.
tar xf dsp-<version>-linux-amd64.tar
- Navigate to the extracted file's
bin
directory.cd <dsp-version>/bin
- Prepare the GKE yaml file with the required parameters based on your environment. See GKE Parameter Definitions for more information on each parameter.
- Run the following Spawn commands to create a new cluster.
./spawn cluster create <cluster_name> -f ../examples/<file name>.yaml ./spawn cluster apply <cluster_name>
Once cluster generation is complete, Spawn outputs the following:
[I ] [I ] Apply complete! Resources: 3 added, 0 changed, 0 destroyed. [I ]
- Navigate to your Google Kubernetes Engine service UI and verify that your new cluster is present.
DSP Installation on GKE cluster
- Update the following cluster information.
export GOOGLE_APPLICATION_CREDENTIALS=/opt/<service_account>.json export CLUSTER_NAME=<cluster_name> export CLUSTER_ZONE_OR_REGION=<region_or_zone>
- Update the Google Cloud Storage checkpoint's default
yaml
file with the following configurations.For configurations requiring
BUCKET_NAME
, use the same bucket name from Create and set up a Google Cloud Storage bucket.Configuration Value google_credential_file_encoded
<encoded credentials>
This value is the output from running
Base64 -i -w0 < /opt/service-account.json
state_checkpoints_dir
gs://<BUCKET-NAME>/flink/gcp-checking
high_availability_storagedir
gs://<BUCKET-NAME>/flink
flink_state_savepoint_base_uri
gs://<BUCKET-NAME>/flink/savepoint
plugin_storage
gcs
plugin_s3bucket
<BUCKET-NAME>
gcs_project_number
<project number>
gcs_project_number_encoded
<encoded project number>
This value is the output from running
echo -n '
' | base64 cloud_provider
GKE
- Install the Splunk Data Stream Processor. See Install DSP with internal registry to install DSP on your internal registry instead of Google Container Registry. For a list of all available flags, see the install flags sections in Install the Splunk Data Stream Processor and Preparing Google Cloud Platform to install the Splunk Data Stream Processor.
./dsp install –cluster-provider=gke —-registryURL=gcr.io —-accept-license —-flavor=<flavor type> [--cluster-type=<cluster type>]
After these steps, installation continues. Once installation completes, k0s outputs the login credentials to access the DSP UI as well as information about what services are now available as shown here:
Finished installing DSP ... To log into DSP: Hostname: https://<localhost> Username: dsp-admin Password: <password> NOTE: this is the original password created during cluster bootstrapping, and will not be updated if dsp-admin's password is changed The following endpoints are available on the cluster: ENDPOINTS IP:PORT DSP UI <localhost> S2S Forwarder <localhost>:9997 * Please make sure your firewall ports are open for these services * To see these login instructions again: please run sudo dsp admin print-login
Reference
(Optional) Install Install DSP with internal registry
If you want to install DSP with your internal registry instead of Google Container Registry, complete the following steps. Ensure that you complete all steps in Prerequisites, GKE cluster creation, and steps 1-3 of DSP Installation on GKE cluster.
- Run the following commands to load the two required images into your internal registry.
Ensure that you set proper permissions to your internal registry so that it can communicate with your GKE cluster and pull these images.
cd <dsp-folder>/k0s/init/airgap/generic docker load -i nginx-1.22.0-alpine-11.tar docker load -i registry-v2.8.1.tar docker tag nginx:1.22.0-alpine-11 <internal-registry>/path/nginx:1.22.0-alpine-11 docker tag registry:v2.8.1 <internal-registry>/path/registry:v2.8.1
- Authenticate the two images with your internal registry.
docker push <internal-registry>/path/nginx:1.22.0-alpine-11 docker push <internal-registry>/path/registry:v2.8.1
- Set the following environment configurations
export GOOGLE_APPLICATION_CREDENTIALS=/opt/service-account.json export CLUSTER_NAME=<cluster-name> export CLUSTER_ZONE_OR_REGION=<region or zone based on cluster type> export REGISTRY_IMAGE=<internal-registry>/path/registry:v2.8.1 export NGINX_IMAGE=<internal-registry>/path/nginx:1.22.0-alpine-11
- Install DSP.
./dsp install --cluster-provider=gke --registryURL=internal --accept-license --debug
GKE Parameter Definitions
The following list describes the required and optional parameters available in GKE. These parameters are specific to a DSP installation on GKE.
Parameter | Required/Optional for DSP | Description |
---|---|---|
nodes
|
Required | The number of nodes to create in your cluster. In regional or multi-zone clusters, this is the number of nodes per zone. |
username
|
Required | Username for the node, either ec2 or compute engine instance |
keypair
|
Required | The name of the account used for provisioning resources., For GCP, it is the service account name. |
region
|
Required | The region where you want to provision resources. |
zone
|
Required (if zonal cluster) | The region's zone where you want to provision resources. Defaults to us-central1-a . If changing region , you must include a zone .
|
gke_private_cluster
|
Required | Specifies whether your cluster is public or private. |
regional_cluster
|
Required | Specifies whether your cluster is regional or zonal. |
authorized_ipv4_cidr_block
|
Required | Other networks CIDR that can access the Kubernetes cluster controller through HTTPS. |
asg_min_node_count
|
Required | Minimum number of nodes per zone in the node pool. Must be greater than zero and less than max_node_count . Cannot be used with total limits.
|
asg_max_node_count
|
Required | Maximum number of nodes per zone in the node pool. Must be greater than min_node_count. . Cannot be used with total limits.
|
gke_nodepool_service_account
|
Required | Custom service accounts that have the cloud-platform scope and IAM role permissions for the GKE cluster worker nodes. |
release_channel
|
Required | Configuration options for the release channel feature, which provides more control over automatic upgrades of your GKE clusters. Search for "release channel" in the Google Cloud documentation for more information about this feature.
When updating this field, GKE imposes specific version requirements. The Removing the |
auto_repair
|
Required | Specifies whether the node auto-repair function is turned on for the node pool. When turned on, it monitors the nodes in the node pool and triggers an automatic repair if the nodes fail health checks repeatedly. Default setting is true
|
auto_upgrade
|
Required | Specifies whether node auto-upgrade is turned on for the node pool. When turned on, it keeps the nodes in your node pool up to date with the latest release version of Kubernetes. Default value is false .
|
cmek_key
|
Required | The Customer Managed Encryption Key used to encrypt the boot disk attached to each node in the node pool. This should look like the following:
projects/[KEY_PROJECT_ID]/locations/[LOCATION]/keyRings/[RING_NAME]/cryptoKeys/[KEY_NAME] |
maintenance_start_time
|
Required | Time window specified for daily maintenance operations. Specify start time as <HH>:<MM> .
|
volumeSizeGB
|
Required | Volume of the compute engine resource. Default value is 200GB
|
network
|
Required | Network to be used. For GCP, it is the VPC network name. |
subnet
|
Required | Subnet to be used. |
instanceType
|
Required | Type of instance. |
project
|
Required | Name of project where the resources in your cluster will be created. |
keyPath
|
Optional | The path of the ssh key for the nodes.
|
services_ipv4_cidr_block
|
Optional | The IP address range of the services IPs in your cluster. Leave blank to choose a default size range. Set to /<netmask> to have a range chosen with a specific netmask. Set to a CIDR notation from the RFC-1918 private networks to use a specific range.
|
pods_ipv4_cidr_block
|
Optional | The IP address range for the cluster pod IPs. Leave blank to choose a default size range. Set to /<netmask> to have a range chosen with a specific netmask. Set to a CIDR notation from the RFC-1918 private networks to use a specific range.
|
image_type
|
Optional | The image type to use for a node. Search for "node images" in the Google Cloud documentation. Default is Ubuntu with containerd.
Changing the image type will delete and recreate all nodes in the node pool. |
Install the Splunk Data Stream Processor | Upgrade the Splunk Data Stream Processor to 1.4.3 |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5
Feedback submitted, thanks!