On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
Create a DSP connection to send data to Amazon S3
To send data from a data pipeline in Splunk Data Stream Processor (DSP) to an Amazon S3 bucket, you must first create a connection using the Write Connector for Amazon S3. You can then use the connection in the Send to Amazon S3 sink function to send data from your DSP pipeline to your Amazon S3 bucket.
The Write Connector for Amazon S3 can't be used to get data from Amazon S3 into a pipeline. If you want to collect data from Amazon S3, you must use the Amazon S3 connector. See Create a DSP connection to get data from Amazon S3 for more information.
Prerequisites
Before you can create the Amazon S3 connection, you must have the following:
- An Identity and Access Management (IAM) user with at least read and write permissions for the destination bucket. Permissions for decrypting KMS-encrypted files might also be required. See the IAM user permissions section on this page for more information.
- The access key ID and secret access key for that IAM user.
If you don't have an IAM user with the necessary permissions, ask your Amazon Web Services (AWS) administrator for assistance.
IAM user permissions
Make sure your IAM user has at least read and write permissions for the destination bucket. See the following list of permissions:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts", "s3:ListBucketMultipartUploads", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "*" } ] }
If you plan to encrypt your files using the SSE-KMS algorithm with a custom Customer Master Key (CMK), then your IAM user must also have the following permissions:
kms:Decrypt
kms:GenerateDataKey
, if the IAM user is not in the same AWS account as the AWS KMS key.
Additionally, the key policy must also include the kms:Decrypt
permission.
As a best practice for making sure that these permissions are not applied unnecessarily to other Amazon S3 buckets or other folders inside your bucket, in the Resource
element, specify the names of your destination bucket and folder. For example, the following Resource
definition ensures that your specified permissions are applied only to the bucket named MyBucket
and the folder inside it named MyFolder
:
"Resource": [ "arn:aws:s3:::MyBucket", "arn:aws:s3:::MyBucket/MyFolder/*" ]
To route data to specific subfolders in MyFolder
, set the prefix
parameter in the Send to Amazon S3 sink function to your desired subfolder path starting from MyFolder
. For example, to route data to subfolders that are named with the year and month, set prefix
to MyFolder/#{datetime:yyyy-MM}
. See Send data to Amazon S3 in the Function Reference manual for more information about configuring the sink function.
Steps
- From the Data Stream Processor home page, click Data Management and then select the Connections tab.
- Click Create new connection.
- Select Write Connector for Amazon S3 and then click Next.
- Complete the following fields:
Field Description Connection Name A unique name for your connection. Description (Optional) A description of your connection. AWS Access Key ID The access key ID for your IAM user. AWS Secret Access Key The secret access key for your IAM user. Any credentials that you upload are transmitted securely by HTTPS, encrypted, and securely stored in a secrets manager.
- Click Save.
If you're editing a connection that's being used by an active pipeline, you must reactivate that pipeline after making your changes. When you reactivate a pipeline, you must select where you want to resume data ingestion. See Using activation checkpoints to activate your pipeline in the Use the Data Stream Processor manual for more information.
You can now use your connection in a Send to Amazon S3 sink function at the end of your data pipeline to send data to Amazon S3. For instructions on how to build a data pipeline, see the Building a pipeline chapter in the Use the manual. For information about the sink function, see Send data to Amazon S3 in the Function Reference manual.
If you're planning to send data to Amazon S3 in Parquet format, make sure to include pipeline functions that extract relevant data from union-typed fields into explicitly typed top-level fields. See Formatting DSP data for Parquet files in Amazon S3.
Create a DSP connection to get data from Amazon S3 | Formatting DSP data for Parquet files in Amazon S3 |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5
Feedback submitted, thanks!