All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.
Preparing Google Cloud Platform to install the Splunk Data Stream Processor
You can install the Splunk Data Stream Processor (DSP) on the Google Cloud Platform (GCP). If your use case demands a high availability of services, install the Splunk Data Stream Processor on the Google Cloud Platform. In addition, if you install DSP on GCP, you will also be able to leverage a blue-green upgrade model for future upgrades after version 1.3.0. In order to provide high availability and increased robustness, the can be configured to use the following Google Cloud services instead of the default open-sourced pre-packaged components that are bundled with the :
- Google Cloud Secret Manager instead of the HashiCorp Vault to store application secrets.
- GCP Cloud SQL for PostgreSQL instead of the prepackaged PostgreSQL to store application and services data, such as pipeline metadata.
- Google Cloud Storage instead of SeaweedFS to store checkpoints and other important application files and artifacts.
Note that the high availability and data resiliency guarantees only apply to the sources that are compatible with Splunk DSP Firehose. See Splunk DSP Firehose for more information. For sources that are not compatible with the Splunk DSP Firehose and therefore rely on other infrastructure, for example a Kafka messaging bus not managed by the , DSP has no way of shielding against failures of those components. Ensuring availability and robustness of such outside components are left to the user. See Data retention policies for more information.
These resources need to be created and configured before the is installed. Perform the following steps to create and configure the resources needed in order to install the on GCP:
- Create a Google Cloud auto mode or custom mode VPC network.
- Create a dedicated Google Cloud service account.
- Create GCP Cloud SQL for PostgreSQL instances.
- Create and set up a Google Cloud Storage bucket.
- Enable the Google Cloud Platform Secret Manager.
- Create an installation file to install the Splunk Data Stream Processor on Google Cloud.
- Install the Splunk Data Stream Processor.
Prerequisites
Before you can install the on the Google Cloud Platform, you must create the Google Cloud resources that are required by the cluster. Before you start, decide on a prefix
, cluster name
, and a suffix
to use. These values are used to identify all DSP resources in your Google Cloud environment.
Create a Google Cloud auto mode or custom mode VPC network
First, create a Google Cloud VPC network. You can choose to create an auto mode or custom mode VPC network. Then, add firewall rules and create a private services connection to allow Google Cloud clients to connect to the Cloud SQL instances which you will set up in a later task.
- Create a Google Cloud VPC network. Search for "Using VPC networks" in the Google Cloud documentation for information.
- Add firewall rules. Search for "VPC firewall rules overview" in the Google Cloud documentation for information.
- You must add a firewall rule that matches the subnet CIDR for the region that you are using.
- If you are using a custom pod CIDR, make sure that it is added or allowed in the ingress rule.
- If you are not using a custom pod CIDR, the Splunk Data Stream Processor uses
100.96.0.0/16
by default. Make sure that this CIDR is added to the firewall rule.
- Create a private services connection to the created network. Search for "Configuring private services access" in the Google Cloud documentation for information. Use the following information when creating the private services connection:
Field in IP Allocation menu Value Name The name of your allocated IP range must be formatted as <prefix>-<cluster name>
. For example, dsp-test-cns.Description A description of the IP range. IP range Select Automatic. Prefix length Set this to 16.
Create a dedicated Google Cloud service account
The requires a Google Cloud service account that provides authentication and authorization to access data in the Google APIs. The also needs a service account key for installation.
- Create a dedicated Google Cloud service account. Make sure that your Google Cloud service account name has the following format:
<prefix>-<cluster name>-<suffix>
. For example,dsp-test-cns-12345
.
Search for "Creating and managing service accounts" in the Google Cloud documentation for information. - Assign IAM roles to the Google Cloud service account. Your Google Cloud service account must have the following IAM roles and conditions in order to interact with and allow a trust chain with the Splunk Data Stream Processor.
Search for "Understanding roles" and "Overview of IAM Conditions" in the Google Cloud documentation for information on IAM roles and conditions.IAM role name Condition type Condition operator Condition value Secret Manager Admin Name Starts with projects/<project-number>/secrets/<var.prefix>-<var.cluster> Storage Object Admin Name Starts with projects/_/buckets/<var.prefix>-<var.cluster> Storage Object Viewer Name Starts with projects/_/buckets/<var.prefix>- Compute Instance Admin (v1) Name Starts with projects/<var.project>/zones/<var.zone>/instances/<var.prefix>-<var.cluster>
If you are deploying Google Cloud in multiple zones, add this condition for each zone.
Compute Network Viewer Name Starts with projects/<var.project>/global/networks/<var.network> Compute Network Admin Name Starts with projects/<var.project>/regions/<var.region>/subnetworks/<var.subnetwork>
If you have multiple subnetworks, add this condition for each subnetwork.
- If you are planning to use the
--cloud-provider
installation flag, then you need the following additional IAM roles. Skip this step if you are not planning to enable Cloud Provider Integration.IAM role name Condition type Condition operator Condition value Compute Instance Admin (v1) Name Starts with projects/<var.project>/zones/<var.zone>/disks Compute Network Viewer Name Starts with projects/<var.project>/regions/<var.region> Compute Viewer Name Starts with projects/<var.project> - Create a service account key. Make sure to select JSON as the key type. When you create a service account key, a public/private key pair is created where the public portion is stored on Google Cloud and the private portion is available only to you. Search for "Creating and managing service account keys" in the Google Cloud documentation for information.
- Base64-encode the downloaded service account key file by running the following command:
cat ~/<path>/<service_account_key.json> | base64
- Copy the output somewhere. You will need this for the "Create an installation file for Google Cloud and install the Splunk Data Stream Processor" step.
Create GCP Cloud SQL for PostgreSQL instances
The Google Cloud Platform provides PostgreSQL databases that the Splunk Data Stream Processor can use to store application and services data such as metadata about pipelines. Create five PostgreSQL instances of GCP Cloud SQL and a database in each instance using the values provided in the following table. Search for "Creating instances" in the Google Cloud Cloud SQL documentation for information. Make sure you are viewing the PostgreSQL tab in the Google Cloud Cloud SQL documentation.
Instance name | Database name | Database version | Username | Password | Region and zonal availability | Customize your instance |
---|---|---|---|---|---|---|
<prefix>-<cluster name>-hec-<suffix> | hec | Postgres 9.6 | hec | Give your Cloud SQL instance a password. | Multiple zones (highly available) | In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network". |
<prefix>-<cluster name>-iac-<suffix> | identity | Postgres 9.6 | splunk | Give your Cloud SQL instance a password. | Multiple zones (highly available) | In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network". |
<prefix>-<cluster name>-s2s-<suffix> | s2s | Postgres 9.6 | s2s | Give your Cloud SQL instance a password. | Multiple zones (highly available) | In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network". |
<prefix>-<cluster name>-streams-<suffix> | splunk_streaming_rest | Postgres 9.6 | streams | Give your Cloud SQL instance a password. | Multiple zones (highly available) | In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network". |
<prefix>-<cluster name>-uaa-<suffix> | uaa | Postgres 9.6 | uaa | Give your Cloud SQL instance a password. | Multiple zones (highly available) | In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network". |
Create and set up a Google Cloud Storage bucket
The Google Cloud Platform provides object cloud storage that the can use to store container images. Follow these steps to set up a Google Cloud Storage bucket to use for storage.
- Create a multi-region Google Cloud Storage bucket with a name in the format:
<prefix>-<cluster name>-<suffix>
. Search for "Creating storage buckets" in the Google Cloud documentation. - Once the bucket is created, create a folder in the bucket with the name:
<cluster name>
.
Enable the Google Cloud Platform Secret Manager
The Google Cloud Platform provides a secrets manager that the can use to store application secrets. Search for "Configuring Secret Manager" in the Google Cloud documentation for instructions on how to enable the Google Cloud Platform Secrets Manager.
Create an installation file to install the on Google Cloud
After all the Google Cloud resources are created, you must create a config.yml file to install the Splunk Data Stream Processor and deploy your cluster.
Prerequisites
You need the following information to complete this task:
- The prefix, cluster name, and suffix that you've been using. In the following steps, replace the
<PREFIX>
,<CLUSTER NAME>
, and<SUFFIX>
placeholders with these values. - The base64-encoded JSON string associated with the service account. In the following steps, replace the
<ENCODED_SERVICE_ACCOUNT_JSON>
placeholder with this JSON string. - The hostname and passwords for all five of the Cloud SQL instances that you created. In the following steps, replace all
<HOSTNAME_FOR_*>
and<PASSWORD_FOR_*>
placeholders with the associated hostname and password for each instance. - Download the DSP TAR file, create the nodes, and make sure they are ready to join the cluster. When creating and preparing the nodes, follow the processing cluster instructions provided by your Splunk Data Stream Processor representative.
Steps
- Create the installation configuration file by expanding, copying, and saving the config.yml file This is a template that you will modify in the next step to define the resources that make up your DSP environment. To see an example config.yml file with comments and sample values, see Sample customized config.yaml file for Google Cloud.
- Replace all of the values contained in the
<>
symbols with the values associated with your own environment. These are the values you collected as part of the prerequisites. - After you have added the service account JSON key to the config.yml file, delete the JSON file because an attacker could use it to gain administrative privileges in the Google Cloud environment. Use a secure deletion tool such as Cipher or SRM to delete the service account JSON key that you downloaded in Create a dedicated Google Cloud service account. See Cipher or SRM for more information. In order to perform a secure deletion, you must do the following actions with a secure deletion tool:
- Overwrite the file with zeros.
- Overwrite the file with ones.
- Overwrite the file with random characters.
- Delete the file.
Expand this section to see the config.yml file.
apiVersion: v1 kind: ConfigMap metadata: name: deployer-config namespace: kube-system data: K8S_CLOUD_RESOURCE_PREFIX: <PREFIX>-<CLUSTER-NAME>- K8S_DATABASE_SERVICES_STATUS_TARGETS_OVERRIDE: <CLOUDSQL_HOSTNAME_1>:5432,<CLOUDSQL_HOSTNAME_2>:5432,<CLOUDSQL_HOSTNAME_3>:5432,<CLOUDSQL_HOSTNAME_4>:5432,<CLOUDSQL_HOSTNAME_5>:5432 K8S_FLINK_HIGH_AVAILABILITY_STORAGEDIR: gs://<PREFIX>-<CLUSTER-NAME>-<SUFFIX>/<CLUSTER-NAME>/flink/jobgraphs K8S_FLINK_STATE_BASE_URI: <PREFIX>-<CLUSTER-NAME>-<SUFFIX> K8S_FLINK_STATE_CHECKPOINT_BASE_URI: gs://<PREFIX>-<CLUSTER-NAME>-<SUFFIX>/<CLUSTER-NAME>/flink/checkpoints K8S_FLINK_STATE_SAVEPOINT_BASE_URI: gs://<PREFIX>-<CLUSTER-NAME>-<SUFFIX>/<CLUSTER-NAME>/flink/savepoints K8S_IAC_POSTGRES_DB: identity K8S_IAC_POSTGRES_HOSTNAME: <HOSTNAME_FOR_IDENTITY_DATABASE> K8S_IAC_POSTGRES_REPLICAS: "0" K8S_IAC_POSTGRES_USER: splunk K8S_NILE_HEC_POSTGRES_DB: hec K8S_NILE_HEC_POSTGRES_HOSTNAME: <HOSTNAME_FOR_HEC_DATABASE> K8S_NILE_HEC_POSTGRES_REPLICAS: "0" K8S_NILE_HEC_POSTGRES_USER: hec K8S_NILE_S2S_POSTGRES_DB: s2s K8S_NILE_S2S_POSTGRES_DB_NAME: s2s K8S_NILE_S2S_POSTGRES_HOSTNAME: <HOSTNAME_FOR_S2S_DATABASE> K8S_NILE_S2S_POSTGRES_REPLICAS: "0" K8S_NILE_S2S_POSTGRES_USER: s2s K8S_POSTGRES_DB: splunk_streaming_rest K8S_POSTGRES_HOSTNAME: <HOSTNAME_FOR_STREAMS_DATABASE> K8S_POSTGRES_USER: streams K8S_STREAMS_POSTGRES_REPLICAS: "0" K8S_SECRETS_MANAGER_MANAGER_TYPE: gcp K8S_SS_REST_FILE_UPLOAD_STORAGE: gcs K8S_SS_REST_PLUGIN_BUCKET_PATH_PREFIX: <CLUSTER-NAME> K8S_SS_REST_PLUGIN_S3BUCKET: <PREFIX>-<CLUSTER-NAME>-<SUFFIX> K8S_SS_REST_PLUGIN_STORAGE: gcs K8S_UAA_POSTGRES_DB: uaa K8S_UAA_POSTGRES_HOSTNAME: <HOSTNAME_FOR_UAA_DATABASE> K8S_UAA_POSTGRES_REPLICAS: "0" K8S_UAA_POSTGRES_USER: uaa --- apiVersion: v1 data: {} kind: Secret metadata: name: deployer-secrets namespace: kube-system stringData: K8S_GOOGLE_CREDENTIAL_FILE_ENCODED: <ENCODED_SERVICE_ACCOUNT_JSON> K8S_IAC_POSTGRES_PASSWORD: <PASSWORD_FOR_IAC_DATABASE> K8S_NILE_HEC_POSTGRES_PASSWORD: <PASSWORD_FOR_HEC_DATABASE> K8S_NILE_S2S_POSTGRES_PASSWORD: <PASSWORD_FOR_S2S_DATABASE> K8S_POSTGRES_PASSWORD: <PASSWORD_FOR_STREAMS_DATABASE> K8S_UAA_POSTGRES_PASSWORD: <PASSWORD_FOR_UAA_DATABASE> type: Opaque
Install the Splunk Data Stream Processor
Install the Splunk Data Stream Processor the same way that you install any processing cluster. For a list of all available optional flags, see the Additional installation flags section.
Run this command where you downloaded the DSP installation TAR file.
./install --config=config.yaml [--optional-flags]
Reference
Sample customized config.yaml file for Google Cloud
This sample YAML file is provided for reference only.
Expand this section to see an example of the config.yml template with sample values.
apiVersion: v1 kind: ConfigMap metadata: name: deployer-config namespace: kube-system data: # The {prefix}-{cluster}- K8S_CLOUD_RESOURCE_PREFIX: dsp-test-cns- # The hostname and ports for all of the Cloud SQL instances. K8S_DATABASE_SERVICES_STATUS_TARGETS_OVERRIDE: 10.199.3.234:5432,10.199.3.231:5432,10.199.3.232:5432,10.199.3.235:5432,10.199.3.233:5432 # The path for the Flink job graphs: K8S_FLINK_HIGH_AVAILABILITY_STORAGEDIR: gs://dsp-test-cns-12345/test-cns/flink/jobgraphs K8S_FLINK_STATE_BASE_URI: dsp-test-cns-12345 K8S_FLINK_STATE_CHECKPOINT_BASE_URI: gs://dsp-test-cns-12345/test-cns/flink/checkpoints K8S_FLINK_STATE_SAVEPOINT_BASE_URI: gs://dsp-test-cns-12345/test-cns/flink/savepoints # The following lines are for the Cloud SQL instance for the Identity database. K8S_IAC_POSTGRES_DB: identity # Cloud SQL hostname for the Identity database. K8S_IAC_POSTGRES_HOSTNAME: 10.199.3.231 # We are using Cloud SQL, so the replica here should be 0. K8S_IAC_POSTGRES_REPLICAS: "0" # Cloud SQL username for the Identity database. K8S_IAC_POSTGRES_USER: splunk # The following lines are for the Cloud SQL instance for the hec database. K8S_NILE_HEC_POSTGRES_DB: hec # Cloud SQL hostname for the hec database. K8S_NILE_HEC_POSTGRES_HOSTNAME: 10.199.3.235 # Tells the DSP installer to use GCP Cloud SQL for PostgreSQL instead of PostgreSQL. K8S_NILE_HEC_POSTGRES_REPLICAS: "0" # Cloud SQL username for the hec database. K8S_NILE_HEC_POSTGRES_USER: hec # The following lines are for the Cloud SQL instance for the s2s database. K8S_NILE_S2S_POSTGRES_DB: s2s K8S_NILE_S2S_POSTGRES_DB_NAME: s2s # Cloud SQL hostname for the s2s database. K8S_NILE_S2S_POSTGRES_HOSTNAME: 10.199.3.232 # Tells the DSP installer to use GCP Cloud SQL for PostgreSQL instead of PostgreSQL. K8S_NILE_S2S_POSTGRES_REPLICAS: "0" # Cloud SQL username for the s2s database. K8S_NILE_S2S_POSTGRES_USER: s2s # The following lines are for the Cloud SQL instance for the splunk_streaming_rest database. K8S_POSTGRES_DB: splunk_streaming_rest # Cloud SQL hostname for the splunk_streaming_rest database. K8S_POSTGRES_HOSTNAME: 10.199.3.234 # Cloud SQL username for the streams database. K8S_POSTGRES_USER: streams # Tells the DSP installer to use GCP Cloud SQL for PostgreSQL instead of PostgreSQL. K8S_STREAMS_POSTGRES_REPLICAS: "0" # The following lines are for the Cloud SQL instance for the uaa database. K8S_UAA_POSTGRES_DB: uaa # Cloud SQL hostname for the uaa database. K8S_UAA_POSTGRES_HOSTNAME: 10.199.3.233 # We are using Cloud SQL, so the replica here should be 0. K8S_UAA_POSTGRES_REPLICAS: "0" # Cloud SQL username for the s2s database. K8S_UAA_POSTGRES_USER: uaa # Tells the DSP installer to use the Google Cloud Secrets Manager instead of HashiCorp Vault. K8S_SECRETS_MANAGER_MANAGER_TYPE: gcp # Tells the DSP installer to use Google Cloud storage for file storage instead of MinIO. K8S_SS_REST_FILE_UPLOAD_STORAGE: gcs # The cluster name. K8S_SS_REST_PLUGIN_BUCKET_PATH_PREFIX: test-cns # The Google Cloud Storage bucket name. K8S_SS_REST_PLUGIN_S3BUCKET: dsp-test-cns-12345 # Tells the DSP installer to use Google Cloud Storage for plugin storage instead of MinIO. K8S_SS_REST_PLUGIN_STORAGE: gcs --- apiVersion: v1 data: {} kind: Secret metadata: name: deployer-secrets namespace: kube-system stringData: # The base64-encoded string associated with the private key of the service account. K8S_GOOGLE_CREDENTIAL_FILE_ENCODED: <value stored> # Cloud SQL password for the IAC database. K8S_IAC_POSTGRES_PASSWORD: aBcRee1 # Cloud SQL password for the HEC database. K8S_NILE_HEC_POSTGRES_PASSWORD: xta3AYW # Cloud SQL password for the S2S database. K8S_NILE_S2S_POSTGRES_PASSWORD: asd8SOW # Cloud SQL password for the STREAMS database. K8S_POSTGRES_PASSWORD: LwwOPq2 # Cloud SQL password for the UAA database. K8S_UAA_POSTGRES_PASSWORD: GrT332q type: Opaque
Additional installation flags
You can use these installation flags when installing the Splunk Data Stream Processor:
Flag | Description |
---|---|
--accept-license | Automatically accepts the license agreement printed upon completion.
|
--location <path> | Changes the location where k0s stores containers and state information. If you do not have enough disk space in /var to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage.
|
--config <config.yaml> | Specifies the configuration file that defines the resources to create in the DSP cluster during installation. |
--cloud-provider <gce, aws, or generic> | Enables cloud provider integration. If not specified, defaults to generic. Enabling this integration sets up disks, node labels, and networking on Google Cloud. If you want to enable Google Cloud Engine (GCE) integration, search for "Google Compute Engine" in the Gravity documentation for prerequisites. In addition, you need to add additional IAM roles if you have this setting enabled. See Create a dedicated Google Cloud service account. |
Additional installation considerations | Install the Splunk Data Stream Processor |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6
Feedback submitted, thanks!