On October 30, 2022, all 1.2.x versions of the Splunk Data Stream Processor will reach its end of support date. See the Splunk Software Support Policy for details.
Get data from Amazon S3
Use the Amazon S3 source function to get data from Amazon S3 buckets.
The source function collects data from S3 according to the job schedule defined in the connection, and identifies the new data to collect based on S3 event notifications conveyed through Amazon Simple Queue Service (SQS). See How Amazon S3 data is collected in the Connect to Data Sources and Destinations with DSP manual for more information.
Prerequisites
Before you can use this function, you must create a connection. See Create a DSP connection to get data from Amazon S3 in the Connect to Data Sources and Destinations with the manual. When configuring this source function, set the connection_id
argument to the ID of that connection.
Function output schema
This function outputs data pipeline events using the event schema.
In the attributes
field, the function includes the following attributes in addition to the ones that are part of the original payload:
accountID
: The ID of the AWS account associated with the event. This attribute is returned as an empty string (""
) if the account ID cannot be retrieved.lastModified
: The date and time when the Amazon S3 file was last modified, given in epoch time format in seconds.etag
: The entity tag (ETag) associated with the Amazon S3 file.
The following is an example of a typical record from the read_from_aws_s3
function:
{ "timestamp": 1562975395000, "nanos": 0, "id": "2823738566644596", "host": "test-host-1", "source": "https://s3.us-east-1.amazonaws.com/bucket/test/log.gz", "source_type": "aws:s3:plaintext", "kind": "event", "body": "helloworld", "attributes": { "accountID": "123412341234", "lastModified": 1562717968, "etag": "brvlvj6883e6pa6u47fr0vvmaky891vr" } }
Required arguments
- connection_id
- Syntax: string
- Description: The ID of your Amazon S3 connection.
- Example: "576205b3-f6f5-4ab7-8ffc-a4089a95d0c4"
Optional arguments
- initial_position
- Syntax: LATEST | TRIM_HORIZON
- Description: The position in the data stream where you want to start reading data. Defaults to LATEST.
- LATEST: Start reading data from the latest position on the data stream.
- TRIM_HORIZON: Start reading data from the very beginning of the data stream.
- Example: LATEST
SPL2 example
When working in the SPL View, you can write the function by listing arguments in this exact order.
| from read_from_aws_s3("my-connection-id", "TRIM_HORIZON") |... ;
Alternatively, you can use named arguments in any order, and omit the optional argument if you just want to use the default value. The following SPL2 example omits the initial_position
argument.
| from read_from_aws_s3(connection_id: "my-connection-id") |... ;
Limitations of the Amazon S3 source function
The Amazon S3 source function uses scheduled data collection jobs to ingest data. See Limitations of scheduled data collection jobs for information about limitations that apply to all scheduled data collection jobs.
Get data from Amazon Metadata | Get data from Apache Pulsar |
This documentation applies to the following versions of Splunk® Data Stream Processor: 1.2.0, 1.2.1-patch02, 1.2.1, 1.2.2-patch02, 1.2.4, 1.2.5, 1.3.0, 1.3.1
Feedback submitted, thanks!