Splunk® Data Stream Processor

Connect to Data Sources and Destinations with DSP

On April 3, 2023, Splunk Data Stream Processor reached its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.

All DSP releases prior to DSP 1.4.0 use Gravity, a Kubernetes orchestrator, which has been announced end-of-life. We have replaced Gravity with an alternative component in DSP 1.4.0. Therefore, we will no longer provide support for versions of DSP prior to DSP 1.4.0 after July 1, 2023. We advise all of our customers to upgrade to DSP 1.4.0 in order to continue to receive full product support from Splunk.

Create a DSP connection to send data to Amazon S3

To send data from a data pipeline in Splunk Data Stream Processor to an Amazon S3 bucket, you must first create a connection using the Write Connector for Amazon S3. You can then use the connection in the Send to Amazon S3 sink function to send data from your DSP pipeline to your Amazon S3 bucket.

Prerequisites

Before you can create the Amazon S3 connection, you must have the following:

  • An Identity and Access Management (IAM) user with at least read and write permissions for the destination bucket. Permissions for decrypting KMS-encrypted files might also be required. See the IAM user permissions section on this page for more information.
  • The access key ID and secret access key for that IAM user.

If you don't have an IAM user with the necessary permissions, ask your Amazon Web Services (AWS) administrator for assistance.

IAM user permissions

Make sure your IAM user has at least read and write permissions for the destination bucket. See the following list of permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "*"
    }
  ]
}

If you plan to encrypt your files using the SSE-KMS algorithm with a custom Customer Master Key (CMK), then your IAM user must also have the following permissions:

  • kms:Decrypt
  • kms:GenerateDataKey, if the IAM user is not in the same AWS account as the AWS KMS key.

Additionally, the key policy must also include the kms:Decrypt permission.

As a best practice for making sure that these permissions are not applied unnecessarily to other Amazon S3 buckets or other folders inside your bucket, in the Resource element, specify the names of your destination bucket and folder. For example, the following Resource definition ensures that your specified permissions are applied only to the bucket named MyBucket and the folder inside it named MyFolder:

      "Resource": [
        "arn:aws:s3:::MyBucket",
        "arn:aws:s3:::MyBucket/MyFolder/*"
      ]

To route data to specific subfolders in MyFolder, set the prefix parameter in the Send to Amazon S3 sink function to your desired subfolder path starting from MyFolder. For example, to route data to subfolders that are named with the year and month, set prefix to MyFolder/#{datetime:yyyy-MM}. See Send data to Amazon S3 in the Function Reference manual for more information about configuring the sink function.

Steps

  1. In DSP, select the Connections page.
  2. On the Connections page, click Create Connection.
  3. On the Sink tab, select Write Connector for Amazon S3 and then click Next.
  4. Complete the following fields:
    Field Description
    Connection Name A unique name for your connection.
    Description (Optional) A description of your connection.
    AWS Access Key ID The access key ID for your IAM user.
    AWS Secret Access Key The secret access key for your IAM user.

    Any credentials that you upload are transmitted securely by HTTPS, encrypted, and securely stored in a secrets manager.

  5. Click Save.

    If you're editing a connection that's being used by an active pipeline, you must reactivate that pipeline after making your changes. When you reactivate a pipeline, you must select where you want to resume data ingestion. See Using activation checkpoints to activate your pipeline in the Use the Data Stream Processor manual for more information.

You can now use your connection in a Send to Amazon S3 sink function at the end of your data pipeline to send data to Amazon S3. For instructions on how to build a data pipeline, see the Building a pipeline chapter in the Use the Data Stream Processor manual. For information about the sink function, see Send data to Amazon S3 in the Function Reference manual.

If you're planning to send data to Amazon S3 in Parquet format, make sure to include pipeline functions that extract relevant data from union-typed fields into explicitly typed top-level fields. See Formatting DSP data for Parquet files in Amazon S3.

Last modified on 20 September, 2022
Connecting Amazon S3 to your DSP pipeline as a data destination   Formatting DSP data for Parquet files in Amazon S3

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters