Splunk® Data Stream Processor

Connect to Data Sources and Destinations with DSP

DSP 1.2.0 is impacted by the CVE-2021-44228 and CVE-2021-45046 security vulnerabilities from Apache Log4j. To fix these vulnerabilities, you must upgrade to DSP 1.2.4. See Upgrade the Splunk Data Stream Processor to 1.2.4 for upgrade instructions.

On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Create a DSP connection to send data to Amazon S3

To send data from a data pipeline in Splunk Data Stream Processor (DSP) to an Amazon S3 bucket, you must first create a connection using the Write Connector for Amazon S3. You can then use the connection in the Send to Amazon S3 sink function to send data from your DSP pipeline to your Amazon S3 bucket.

The Write Connector for Amazon S3 can't be used to get data from Amazon S3 into a pipeline. If you want to collect data from Amazon S3, you must use the Amazon S3 connector. See Create a DSP connection to get data from Amazon S3 for more information.

Prerequisites

Before you can create the Amazon S3 connection, you must have the following:

  • An Identity and Access Management (IAM) user with at least read and write permissions for the destination bucket. Permissions for decrypting KMS-encrypted files might also be required. See the IAM user permissions section on this page for more information.
  • The access key ID and secret access key for that IAM user.

If you don't have an IAM user with the necessary permissions, ask your Amazon Web Services (AWS) administrator for assistance.

IAM user permissions

Make sure your IAM user has at least read and write permissions for the destination bucket. See the following list of permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "*"
    }
  ]
}

If you plan to encrypt your files using the SSE-KMS algorithm with a custom Customer Master Key (CMK), then your IAM user must also have the following permissions:

  • kms:Decrypt
  • kms:GenerateDataKey, if the IAM user is not in the same AWS account as the AWS KMS key.

Additionally, the key policy must also include the kms:Decrypt permission.

As a best practice for making sure that these permissions are not applied unnecessarily to other Amazon S3 buckets or other folders inside your bucket, in the Resource element, specify the names of your destination bucket and folder. For example, the following Resource definition ensures that your specified permissions are applied only to the bucket named MyBucket and the folder inside it named MyFolder:

      "Resource": [
        "arn:aws:s3:::MyBucket",
        "arn:aws:s3:::MyBucket/MyFolder/*"
      ]

To route data to specific subfolders in MyFolder, set the prefix parameter in the Send to Amazon S3 sink function to your desired subfolder path starting from MyFolder. For example, to route data to subfolders that are named with the year and month, set prefix to MyFolder/#{datetime:yyyy-MM}. See Send data to Amazon S3 in the Function Reference manual for more information about configuring the sink function.

Steps

  1. From the Data Stream Processor home page, click Data Management and then select the Connections tab.
  2. Click Create new connection.
  3. Select Write Connector for Amazon S3 and then click Next.
  4. Complete the following fields:
    Field Description
    Connection Name A unique name for your connection.
    Description (Optional) A description of your connection.
    AWS Access Key ID The access key ID for your IAM user.
    AWS Secret Access Key The secret access key for your IAM user.

    Any credentials that you upload are transmitted securely by HTTPS, encrypted, and securely stored in a secrets manager.

  5. Click Save.

    If you're editing a connection that's being used by an active pipeline, you must reactivate that pipeline after making your changes. When you reactivate a pipeline, you must select where you want to resume data ingestion. See Using activation checkpoints to activate your pipeline in the Use the Data Stream Processor manual for more information.

You can now use your connection in a Send to Amazon S3 sink function at the end of your data pipeline to send data to Amazon S3. For instructions on how to build a data pipeline, see the Building a pipeline chapter in the Use the manual. For information about the sink function, see Send data to Amazon S3 in the Function Reference manual.

If you're planning to send data to Amazon S3 in Parquet format, make sure to include pipeline functions that extract relevant data from union-typed fields into explicitly typed top-level fields. See Formatting DSP data for Parquet files in Amazon S3.

Last modified on 26 February, 2022
Create a DSP connection to get data from Amazon S3   Formatting DSP data for Parquet files in Amazon S3

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters