Add an S3 input for the Splunk App for AWS

Create an S3 input to gather generic logs data from any S3 buckets in your environment. Because the nature of the data you can collect varies widely, this input is not tagged for CIM compliance, nor does data appear in any of the app dashboards. You can configure custom dashboards to fit the nature of the data that you collect.

Before you begin configuring your S3 inputs, note the following expected behaviors.

1. Immediately after you create an S3 data input, the app creates a checkpoint marker to track indexing progress for the input. As a result, you cannot edit an input after you have created it. So, be sure to set all arguments correctly when creating the data input. If you need to change the settings for an input, delete it and configure a new input with your desired settings. The new input causes any data already indexed with the original input to be indexed again.

2. The S3 data input is not intended to read frequently modified files. If a file is modified after its has been indexed, the Splunk platform will index the file again, resulting in duplicated data. Use the whitelist field in the Advanced Settings to instruct the app to index only those files that you know will not be modified later.

3. The S3 data input processes compressed files according to their suffixes. Use these suffixes only if the file is in the corresponding format, or data processing errors will occur. The data input supports the following compression types:

single file in gzip format with suffix .gz
multiple files without folders in tar format with suffix .tar, .tar.gz, .tgz, or .tar.bz2

4. If your S3 bucket contains files that the Splunk platform cannot index, such as binary files, the app downloads the files from S3, but then rejects them.

5. You can configure multiple S3 inputs for a single S3 bucket to improve performance. The Splunk platform dedicates one process for each data input, so provided that your system has sufficient processing power, performance will improve with multiple inputs.

Note: Be sure that multiple inputs do not collect the same S3 folder and file data, to prevent indexing duplicate data.

6. If your S3 bucket names contain periods, you must use a region-specific S3 Host Name in order to communicate with AWS over SSL. You can only configure the S3 Host Name field through the add-on, so S3 inputs collecting data from buckets containing periods can only be configured through the add-on. See "Add an S3 input for Splunk Add-on for AWS" for instructions. Refer to http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region and enter the appropriate region-specific endpoint that matches your bucket location in the S3 Host Name field in the add-on's input configuration. For example, s3-ap-southeast-2.amazonaws.com.

Prerequisites

Before you can successfully configure an S3 input, you need to:

1. Create S3 buckets, folders, and files containing data that you want to collect with the Splunk App for AWS. If you have not already done this, see "Configure your AWS services for the Splunk App for AWS" in this manual.

2. Make sure that the account friendly name you use to configure this input corresponds to an AWS Account Access Key ID that has the necessary permissions to gather this data. If you have not already done this, see "Configure your AWS permissions for the Splunk App for AWS" in this manual.

Add a new S3 input

1. In the app, click Configure in the app navigation bar.

2. Under Data Sources, in the S3 box, click Set up.

3. Select the friendly name of the AWS Account that you want to use to collect S3 data. If you have not yet configured the account you need, click Add New Account to configure one now.

4. Under S3 Bucket, select an S3 bucket from which you want to collect data.

5. Under Folder/File name, select either /, which collects all folders and files in the bucket, or a specific folder or file to index.

6. (Optional) Open the Advanced Settings section to configure additional parameters.

7. (Optional) Enter a custom Source type for the input. One common use case is to configure the source type aws:cloudtrail to collect your CloudTrail logs directly from an S3 bucket, rather than through the CloudTrail input.

8. (Optional) Configure a custom Index for this data.

9. (Optional) Configure the specific Character set used in the S3 folder or file you are collecting. Selecting auto causes the Splunk platform to perform auto-detection among these options: UTF-8 with/without BOM, UTF-16LE/BE with BOM, UTF-32BE/LE with BOM. You can specify another encoding, but doing so will prevent auto-detection. See Configure character set encoding in the Splunk Enterprise > Getting Data In manual for more details about what character sets the Splunk platform supports.

Note: The Splunk platform does not support mixing character sets in a single input stream. If you have a mix of character sets in a single S3 bucket, define a separate S3 input for each character set, using the whitelist filtering to select the S3 folder and file names with consistent character sets.

10. (Optional) Configure a regular expression to Whitelist the specific folder and file names that the input should recursively scan. For example, folder_name/.*.

11. Click Add to save and enable this data input.

Once saved, the input begins collecting all historical data immediately and checks for updates every 30 minutes.

Edit or delete an S3 input

You can view, edit, or delete your existing S3 inputs from the S3 Inputs screen.

1. In the app, click Configure in the app navigation bar.

2. Under Data Sources, in the S3 box, click the link that tells you how many inputs you currently have configured for S3.

3. The S3 Inputs screen displays a list of S3 inputs, organized by the account friendly name used to create the input.

4. From here, you can click the account names to open the individual inputs to edit them or you can delete an input by clicking the trash can icon.

Related answers from Splunk Community

Add an S3 input for the Splunk App for AWS

Prerequisites

Add a new S3 input

Edit or delete an S3 input

Comments

Add an S3 input for the Splunk App for AWS

Was this topic useful?