Splunk® Data Stream Processor

Install and administer the Data Stream Processor

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Preparing Google Cloud Platform to install the Splunk Data Stream Processor

You can install the Splunk Data Stream Processor (DSP) on the Google Cloud Platform (GCP). If your use case demands a high availability of services, install the Splunk Data Stream Processor on the Google Cloud Platform. In addition, if you install DSP on GCP, you will also be able to leverage a blue-green upgrade model for future upgrades after version 1.3.0. In order to provide high availability and increased robustness, the can be configured to use the following Google Cloud services instead of the default open-sourced pre-packaged components that are bundled with the :

  • Google Cloud Secret Manager instead of the HashiCorp Vault to store application secrets.
  • GCP Cloud SQL for PostgreSQL instead of the prepackaged PostgreSQL to store application and services data, such as pipeline metadata.
  • Google Cloud Storage instead of MinIO to store checkpoints and other important application files and artifacts.

Note that the high availability and data resiliency guarantees only apply to the sources that are compatible with Splunk DSP Firehose. See Splunk DSP Firehose for more information. For sources that are not compatible with the Splunk DSP Firehose and therefore rely on other infrastructure, for example a Kafka messaging bus not managed by the , DSP has no way of shielding against failures of those components. Ensuring availability and robustness of such outside components are left to the user. See Data retention policies for more information.

These resources need to be created and configured before the is installed. Perform the following steps to create and configure the resources needed in order to install the on GCP:

  1. Create a Google Cloud auto mode or custom mode VPC network.
  2. Create a dedicated Google Cloud service account.
  3. Create GCP Cloud SQL for PostgreSQL instances.
  4. Create and set up a Google Cloud Storage bucket.
  5. Enable the Google Cloud Platform Secret Manager.
  6. Create an installation file to install the Splunk Data Stream Processor on Google Cloud.
  7. Install the Splunk Data Stream Processor.

Prerequisites

Before you can install the on the Google Cloud Platform, you must create the Google Cloud resources that are required by the cluster. Before you start, decide on a prefix, cluster name, and a suffix to use. These values are used to identify all DSP resources in your Google Cloud environment.

Create a Google Cloud auto mode or custom mode VPC network

First, create a Google Cloud VPC network. You can choose to create an auto mode or custom mode VPC network. Then, add firewall rules and create a private services connection to allow Google Cloud clients to connect to the Cloud SQL instances which you will set up in a later task.

  1. Create a Google Cloud VPC network. Search for "Using VPC networks" in the Google Cloud documentation for information.
  2. Add firewall rules. Search for "VPC firewall rules overview" in the Google Cloud documentation for information.
    • You must add a firewall rule that matches the subnet CIDR for the region that you are using.
    • If you are using a custom pod CIDR, make sure that it is added or allowed in the ingress rule.
    • If you are not using a custom pod CIDR, the Splunk Data Stream Processor uses 100.96.0.0/16 by default. Make sure that this CIDR is added to the firewall rule.
  3. Create a private services connection to the created network. Search for "Configuring private services access" in the Google Cloud documentation for information. Use the following information when creating the private services connection:
    Field in IP Allocation menu Value
    Name The name of your allocated IP range must be formatted as <prefix>-<cluster name>. For example, dsp-test-cns.
    Description A description of the IP range.
    IP range Select Automatic.
    Prefix length Set this to 16.

Create a dedicated Google Cloud service account

The requires a Google Cloud service account that provides authentication and authorization to access data in the Google APIs. The also needs a service account key for installation.

  1. Create a dedicated Google Cloud service account. Make sure that your Google Cloud service account name has the following format: <prefix>-<cluster name>-<suffix>. For example, dsp-test-cns-12345.
    Search for "Creating and managing service accounts" in the Google Cloud documentation for information.
  2. Assign IAM roles to the Google Cloud service account. Your Google Cloud service account must have the following IAM roles and conditions in order to interact with and allow a trust chain with the Splunk Data Stream Processor.
    Search for "Understanding roles" and "Overview of IAM Conditions" in the Google Cloud documentation for information on IAM roles and conditions.
    IAM role name Condition type Condition operator Condition value
    Secret Manager Admin Name Starts with projects/<project-number>/secrets/<var.prefix>-<var.cluster>
    Storage Object Admin Name Starts with projects/_/buckets/<var.prefix>-<var.cluster>
    Storage Object Viewer Name Starts with projects/_/buckets/<var.prefix>-
    Compute Instance Admin (v1) Name Starts with projects/<var.project>/zones/<var.zone>/instances/<var.prefix>-<var.cluster>


    If you are deploying Google Cloud in multiple zones, add this condition for each zone.

    Compute Network Viewer Name Starts with projects/<var.project>/global/networks/<var.network>
    Compute Network Admin Name Starts with projects/<var.project>/regions/<var.region>/subnetworks/<var.subnetwork>


    If you have multiple subnetworks, add this condition for each subnetwork.

  3. If you are planning to use the --cloud-provider installation flag, then you need the following additional IAM roles. Skip this step if you are not planning to enable Cloud Provider Integration.
    IAM role name Condition type Condition operator Condition value
    Compute Instance Admin (v1) Name Starts with projects/<var.project>/zones/<var.zone>/disks
    Compute Network Viewer Name Starts with projects/<var.project>/regions/<var.region>
    Compute Viewer Name Starts with projects/<var.project>
  4. Create a service account key. Make sure to select JSON as the key type. When you create a service account key, a public/private key pair is created where the public portion is stored on Google Cloud and the private portion is available only to you. Search for "Creating and managing service account keys" in the Google Cloud documentation for information.
  5. Base64-encode the downloaded service account key file by running the following command:
    cat ~/<path>/<service_account_key.json> | base64
  6. Copy the output somewhere. You will need this for the "Create an installation file for Google Cloud and install the Splunk Data Stream Processor" step.

Create GCP Cloud SQL for PostgreSQL instances

The Google Cloud Platform provides PostgreSQL databases that the Splunk Data Stream Processor can use to store application and services data such as metadata about pipelines. Create five PostgreSQL instances of GCP Cloud SQL and a database in each instance using the values provided in the following table. Search for "Creating instances" in the Google Cloud Cloud SQL documentation for information. Make sure you are viewing the PostgreSQL tab in the Google Cloud Cloud SQL documentation.

Instance name Database name Database version Username Password Region and zonal availability Customize your instance
<prefix>-<cluster name>-hec-<suffix> hec Postgres 9.6 hec Give your Cloud SQL instance a password. Multiple zones (highly available) In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network".
<prefix>-<cluster name>-iac-<suffix> identity Postgres 9.6 splunk Give your Cloud SQL instance a password. Multiple zones (highly available) In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network".
<prefix>-<cluster name>-s2s-<suffix> s2s Postgres 9.6 s2s Give your Cloud SQL instance a password. Multiple zones (highly available) In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network".
<prefix>-<cluster name>-streams-<suffix> splunk_streaming_rest Postgres 9.6 streams Give your Cloud SQL instance a password. Multiple zones (highly available) In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network".
<prefix>-<cluster name>-uaa-<suffix> uaa Postgres 9.6 uaa Give your Cloud SQL instance a password. Multiple zones (highly available) In Connectivity, select PrivateIP and enter the IP address associated with the network that you created in "Create a Google Cloud auto mode or custom mode VPC network".

Create and set up a Google Cloud Storage bucket

The Google Cloud Platform provides object cloud storage that the can use to store container images. Follow these steps to set up a Google Cloud Storage bucket to use for storage.

  1. Create a multi-region Google Cloud Storage bucket with a name in the format: <prefix>-<cluster name>-<suffix>. Search for "Creating storage buckets" in the Google Cloud documentation.
  2. Once the bucket is created, create a folder in the bucket with the name: <cluster name>.

Enable the Google Cloud Platform Secret Manager

The Google Cloud Platform provides a secrets manager that the can use to store application secrets. Search for "Configuring Secret Manager" in the Google Cloud documentation for instructions on how to enable the Google Cloud Platform Secrets Manager.

Create an installation file to install the on Google Cloud

After all the Google Cloud resources are created, you must create a config.yml file to install the Splunk Data Stream Processor and deploy your cluster.

Prerequisites

You need the following information to complete this task:

  • The prefix, cluster name, and suffix that you've been using. In the following steps, replace the <PREFIX>, <CLUSTER NAME>, and <SUFFIX> placeholders with these values.
  • The base64-encoded JSON string associated with the service account. In the following steps, replace the <ENCODED_SERVICE_ACCOUNT_JSON> placeholder with this JSON string.
  • The hostname and passwords for all five of the Cloud SQL instances that you created. In the following steps, replace all <HOSTNAME_FOR_*> and <PASSWORD_FOR_*> placeholders with the associated hostname and password for each instance.
  • Download the DSP TAR file, create the nodes, and make sure they are ready to join the cluster. When creating and preparing the nodes, follow the processing cluster instructions provided by your Splunk Data Stream Processor representative.

Steps

  1. Create the installation configuration file by expanding, copying, and saving the config.yml file This is a template that you will modify in the next step to define the resources that make up your DSP environment. To see an example config.yml file with comments and sample values, see Sample customized config.yaml file for Google Cloud.
  2. Expand this section to see the config.yml file.

    apiVersion: v1
    kind: ConfigMap
    metadata:
        name: deployer-config
        namespace: kube-system
    data:
        K8S_CLOUD_RESOURCE_PREFIX: <PREFIX>-<CLUSTER-NAME>-
        K8S_DATABASE_SERVICES_STATUS_TARGETS_OVERRIDE: <CLOUDSQL_HOSTNAME_1>:5432,<CLOUDSQL_HOSTNAME_2>:5432,<CLOUDSQL_HOSTNAME_3>:5432,<CLOUDSQL_HOSTNAME_4>:5432,<CLOUDSQL_HOSTNAME_5>:5432
        K8S_FLINK_HIGH_AVAILABILITY_STORAGEDIR: gs://<PREFIX>-<CLUSTER-NAME>-<SUFFIX>/<CLUSTER-NAME>/flink/jobgraphs
        K8S_FLINK_STATE_BASE_URI: <PREFIX>-<CLUSTER-NAME>-<SUFFIX>
        K8S_FLINK_STATE_CHECKPOINT_BASE_URI: gs://<PREFIX>-<CLUSTER-NAME>-<SUFFIX>/<CLUSTER-NAME>/flink/checkpoints
        K8S_FLINK_STATE_SAVEPOINT_BASE_URI: gs://<PREFIX>-<CLUSTER-NAME>-<SUFFIX>/<CLUSTER-NAME>/flink/savepoints
    
    
        K8S_IAC_POSTGRES_DB: identity 
        K8S_IAC_POSTGRES_HOSTNAME: <HOSTNAME_FOR_IDENTITY_DATABASE>   	
        K8S_IAC_POSTGRES_REPLICAS: "0"
        K8S_IAC_POSTGRES_USER: splunk             	
         
        K8S_NILE_HEC_POSTGRES_DB: hec
        K8S_NILE_HEC_POSTGRES_HOSTNAME: <HOSTNAME_FOR_HEC_DATABASE>
        K8S_NILE_HEC_POSTGRES_REPLICAS: "0"
        K8S_NILE_HEC_POSTGRES_USER: hec
     
        K8S_NILE_S2S_POSTGRES_DB: s2s
        K8S_NILE_S2S_POSTGRES_DB_NAME: s2s
        K8S_NILE_S2S_POSTGRES_HOSTNAME: <HOSTNAME_FOR_S2S_DATABASE>
        K8S_NILE_S2S_POSTGRES_REPLICAS: "0"
        K8S_NILE_S2S_POSTGRES_USER: s2s
     
        K8S_POSTGRES_DB: splunk_streaming_rest
        K8S_POSTGRES_HOSTNAME: <HOSTNAME_FOR_STREAMS_DATABASE>
        K8S_POSTGRES_USER: streams
        K8S_STREAMS_POSTGRES_REPLICAS: "0"
     
        K8S_SECRETS_MANAGER_MANAGER_TYPE: gcp
     
        K8S_SS_REST_FILE_UPLOAD_STORAGE: gcs
        K8S_SS_REST_PLUGIN_BUCKET_PATH_PREFIX: <CLUSTER-NAME>
        K8S_SS_REST_PLUGIN_S3BUCKET: <PREFIX>-<CLUSTER-NAME>-<SUFFIX>
        K8S_SS_REST_PLUGIN_STORAGE: gcs
         
        K8S_UAA_POSTGRES_DB: uaa
        K8S_UAA_POSTGRES_HOSTNAME: <HOSTNAME_FOR_UAA_DATABASE>
        K8S_UAA_POSTGRES_REPLICAS: "0"
        K8S_UAA_POSTGRES_USER: uaa
    ---
    apiVersion: v1
    data: {}
    kind: Secret
    metadata:
        name: deployer-secrets
        namespace: kube-system
    stringData:
        K8S_GOOGLE_CREDENTIAL_FILE_ENCODED: <ENCODED_SERVICE_ACCOUNT_JSON>
        K8S_IAC_POSTGRES_PASSWORD: <PASSWORD_FOR_IAC_DATABASE>
        K8S_NILE_HEC_POSTGRES_PASSWORD: <PASSWORD_FOR_HEC_DATABASE>
        K8S_NILE_S2S_POSTGRES_PASSWORD: <PASSWORD_FOR_S2S_DATABASE>
    
        K8S_POSTGRES_PASSWORD: <PASSWORD_FOR_STREAMS_DATABASE>
        K8S_UAA_POSTGRES_PASSWORD: <PASSWORD_FOR_UAA_DATABASE>
    type: Opaque
    
    
  3. Replace all of the values contained in the <> symbols with the values associated with your own environment. These are the values you collected as part of the prerequisites.
  4. After you have added the service account JSON key to the config.yml file, delete the JSON file because an attacker could use it to gain administrative privileges in the Google Cloud environment. Use a secure deletion tool such as Cipher or SRM to delete the service account JSON key that you downloaded in Create a dedicated Google Cloud service account. See Cipher or SRM for more information. In order to perform a secure deletion, you must do the following actions with a secure deletion tool:
    1. Overwrite the file with zeros.
    2. Overwrite the file with ones.
    3. Overwrite the file with random characters.
    4. Delete the file.

Install the Splunk Data Stream Processor

Install the Splunk Data Stream Processor the same way that you install any processing cluster. For a list of all available optional flags, see the Additional installation flags section. If you specified a custom pod CIDR in Create a Google Cloud auto mode or custom mode VPC network, then you must also include the --pod-network-cidr <custom-cidr> flag.

Run this command where you downloaded the DSP installation TAR file.

 ./install --config=config.yaml [--optional-flags]

Reference

Sample customized config.yaml file for Google Cloud

This sample YAML file is provided for reference only.

Expand this section to see an example of the config.yml template with sample values.

apiVersion: v1
kind: ConfigMap
metadata:
    name: deployer-config
    namespace: kube-system
data:
    # The {prefix}-{cluster}-
    K8S_CLOUD_RESOURCE_PREFIX: dsp-test-cns-
    # The hostname and ports for all of the Cloud SQL instances.
    K8S_DATABASE_SERVICES_STATUS_TARGETS_OVERRIDE: 10.199.3.234:5432,10.199.3.231:5432,10.199.3.232:5432,10.199.3.235:5432,10.199.3.233:5432
    # The path for the Flink job graphs:
    K8S_FLINK_HIGH_AVAILABILITY_STORAGEDIR: gs://dsp-test-cns-12345/test-cns/flink/jobgraphs
    K8S_FLINK_STATE_BASE_URI: dsp-test-cns-12345
    K8S_FLINK_STATE_CHECKPOINT_BASE_URI: gs://dsp-test-cns-12345/test-cns/flink/checkpoints
    K8S_FLINK_STATE_SAVEPOINT_BASE_URI: gs://dsp-test-cns-12345/test-cns/flink/savepoints
 

    # The following lines are for the Cloud SQL instance for the Identity database.
    K8S_IAC_POSTGRES_DB: identity 	
    # Cloud SQL hostname for the Identity database. 
    K8S_IAC_POSTGRES_HOSTNAME: 10.199.3.231   	 
    # We are using Cloud SQL, so the replica here should be 0.
    K8S_IAC_POSTGRES_REPLICAS: "0"
    # Cloud SQL username for the Identity database. 
    K8S_IAC_POSTGRES_USER: splunk             	
     
    # The following lines are for the Cloud SQL instance for the hec database.
    K8S_NILE_HEC_POSTGRES_DB: hec
    # Cloud SQL hostname for the hec database. 
    K8S_NILE_HEC_POSTGRES_HOSTNAME: 10.199.3.235
    # Tells the DSP installer to use GCP Cloud SQL for PostgreSQL instead of PostgreSQL.
    K8S_NILE_HEC_POSTGRES_REPLICAS: "0"
    # Cloud SQL username for the hec database. 
    K8S_NILE_HEC_POSTGRES_USER: hec       
 
    # The following lines are for the Cloud SQL instance for the s2s database.
    K8S_NILE_S2S_POSTGRES_DB: s2s
    K8S_NILE_S2S_POSTGRES_DB_NAME: s2s
    # Cloud SQL hostname for the s2s database. 
    K8S_NILE_S2S_POSTGRES_HOSTNAME: 10.199.3.232
    # Tells the DSP installer to use GCP Cloud SQL for PostgreSQL instead of PostgreSQL.
    K8S_NILE_S2S_POSTGRES_REPLICAS: "0"
    # Cloud SQL username for the s2s database. 
    K8S_NILE_S2S_POSTGRES_USER: s2s     
 
    # The following lines are for the Cloud SQL instance for the splunk_streaming_rest	 database.
    K8S_POSTGRES_DB: splunk_streaming_rest
    # Cloud SQL hostname for the splunk_streaming_rest database. 
    K8S_POSTGRES_HOSTNAME: 10.199.3.234
    # Cloud SQL username for the streams database. 
    K8S_POSTGRES_USER: streams
    # Tells the DSP installer to use GCP Cloud SQL for PostgreSQL instead of PostgreSQL.
    K8S_STREAMS_POSTGRES_REPLICAS: "0"
 
    # The following lines are for the Cloud SQL instance for the uaa database.
    K8S_UAA_POSTGRES_DB: uaa
    # Cloud SQL hostname for the uaa database. 
    K8S_UAA_POSTGRES_HOSTNAME: 10.199.3.233
    # We are using Cloud SQL, so the replica here should be 0.
    K8S_UAA_POSTGRES_REPLICAS: "0"
    # Cloud SQL username for the s2s database. 
    K8S_UAA_POSTGRES_USER: uaa

    # Tells the DSP installer to use the Google Cloud Secrets Manager instead of HashiCorp Vault.
    K8S_SECRETS_MANAGER_MANAGER_TYPE: gcp
 
    # Tells the DSP installer to use Google Cloud storage for file storage instead of MinIO.
    K8S_SS_REST_FILE_UPLOAD_STORAGE: gcs
    # The cluster name. 
    K8S_SS_REST_PLUGIN_BUCKET_PATH_PREFIX: test-cns
    # The Google Cloud Storage bucket name. 
    K8S_SS_REST_PLUGIN_S3BUCKET: dsp-test-cns-12345
    # Tells the DSP installer to use Google Cloud Storage for plugin storage instead of MinIO.
    K8S_SS_REST_PLUGIN_STORAGE: gcs

---
apiVersion: v1
data: {}
kind: Secret
metadata:
    name: deployer-secrets
    namespace: kube-system
stringData:
        # The base64-encoded string associated with the private key of the service account.
    K8S_GOOGLE_CREDENTIAL_FILE_ENCODED: <value stored> 
    # Cloud SQL password for the IAC database.
    K8S_IAC_POSTGRES_PASSWORD: aBcRee1

    # Cloud SQL password for the HEC database.
    K8S_NILE_HEC_POSTGRES_PASSWORD: xta3AYW

    # Cloud SQL password for the S2S database.
    K8S_NILE_S2S_POSTGRES_PASSWORD: asd8SOW

    # Cloud SQL password for the STREAMS database.
    K8S_POSTGRES_PASSWORD: LwwOPq2

    # Cloud SQL password for the UAA database.
    K8S_UAA_POSTGRES_PASSWORD: GrT332q
type: Opaque



Additional installation flags

You can use these installation flags when installing the Splunk Data Stream Processor:

Flag Description
--accept-license Automatically accepts the license agreement printed upon completion.


The order of optional flags matter. The --accept-license flag must be specified first, followed by the --location flag, followed by any other desired flags. For example: ./install --accept-license --location <path> --other-optional-flags.

--location <path> Changes the location where Gravity stores containers and state information. By default, Gravity uses /var/lib/gravity to store state information and mounts persistent volumes for containers to /var/data. If you do not have enough disk space in /var to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage.


The order of optional flags matter. The --accept-license flag must be specified first, followed by the --location flag, followed by any other desired flags. For example: ./install --accept-license --location <path> --other-optional-flags.

If you want to store state and mount persistent volumes for containers in two separate directories, use the --mount and --state-dir flags instead.

--config <config.yaml> Specifies the configuration file that defines the resources to create in the DSP cluster during installation.
--cloud-provider <gce, aws, or generic> Enables cloud provider integration. If not specified, defaults to generic. Enabling this integration sets up disks, node labels, and networking on Google Cloud. If you want to enable Google Cloud Engine (GCE) integration, search for "Google Compute Engine" in the Gravity documentation for prerequisites. In addition, you need to add additional IAM roles if you have this setting enabled. See Create a dedicated Google Cloud service account.
--token <token> A secure token that prevents rogue nodes from joining the cluster. Your token must be at least six characters long.
--service-uid <numeric> Specifies the Service User ID. For information about how this is used, search for "Service User" in the Gravity documentation. If not specified, a user named planet is created with user id 1000.
Note: The ./join command does not support the --service-uid or --service-gid flags, but instead, the worker nodes use whatever value is set on the master node with ./install.
--service-gid <numeric> Specifies the Service Group ID. For information about how this is used, search for "Service User" in the Gravity documentation. If not specified, a group named planet is created.
Note: The ./join command does not support the --service-uid or --service-gid flags. Instead, the worker nodes use whatever value is set on the master node with ./install.
--pod-network-cidr <100.96.0.0/16> The CIDR range Kubernetes allocates node subnets and pod IPs from. Must be a minimum of /16 so Kubernetes is able to allocate /24 to each node. If not specified, defaults to 100.96.0.0/16.
--service-cidr <100.100.0.0/16> The CIDR range Kubernetes allocates service IPs from. If not specified, defaults to 100.100.0.0/16.
--mount=data:/<path> --state-dir=/<path> Change the location where Gravity stores containers and state information. By default, Gravity uses /var/lib/gravity to store state information and mounts persistent volumes for containers to /var/data. If you do not have enough disk space in /var to support 24 hours of data retention or you want to change the default location for other reasons, use this command to override the default path used for storage. Use these flags if you want to store state information and mount persistent volumes for containers in two separate directories.


Replace <path> with the path you want Gravity to use for storage. For example, if you want to install everything in /opt/splunk/dsp then run: ./install --mount=data:/opt/splunk/dsp --state-dir=/opt/splunk/dsp.

Last modified on 25 March, 2022
PREVIOUS
Additional installation considerations
  NEXT
Install the Splunk Data Stream Processor

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.3.0, 1.3.1


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters