Configure Generic S3 inputs for the Splunk Add-on for AWS
The Generic S3 input lists all the objects in the bucket and examines each file's modified date every time it runs to pull uncollected data from an S3 bucket. When the number of objects in a bucket is large, this can be a very time-consuming process with low throughput.
Before you begin configuring your Generic S3 inputs, note the following expected behaviors:
- You cannot edit the initial scan time parameter of an S3 input after you create it. If you need to adjust the start time of an S3 input, delete it and recreate it.
- The S3 data input is not intended to read frequently modified files. If a file is modified after it has been indexed, the Splunk platform indexes the file again, resulting in duplicated data. Use key/blocklist/allowlist options to instruct the add-on to index only those files that you know will not be modified later.
- The S3 data input processes compressed files according to their suffixes. Use these suffixes only if the file is in the corresponding format, or data processing errors will occur. The data input supports the following compression types:
- single file in ZIP, GZIP, TAR, or TAR.GZ formats
- multiple files with or without folders in ZIP, TAR, or TAR.GZ format
Expanding compressed files requires significant operating system resources.
character_setparameter and separate out this collection job into its own input. Mixing non-autodetected character sets in a single input causes errors.
To prevent indexing duplicate data, verify that multiple inputs do not collect the same S3 folder and file data.
Configure a Generic S3 input on the data collection node using one of the following ways:
- Configure a Generic S3 input using Splunk Web (recommended)
- Configure a Generic S3 input using configuration file
Configure a Generic S3 input using Splunk Web
To configure inputs in Splunk Web, click Splunk Add-on for AWS in the left navigation bar on Splunk Web home, then choose one of the following menu paths depending on which data type you want to collect:
- Create New Input > CloudTrail > Generic S3
- Create New Input > CloudFront Access Log > Generic S3
- Create New Input > ELB Access Logs > Generic S3
- Create New Input > S3 Access Logs > Generic S3
- Create New Input > Others > Generic S3
Make sure you choose the right menu path corresponding to the data type you want to collect. The system automatically sets the appropriate sourcetype and may display slightly different field settings in the subsequent configuration page based on the menu path.
|Argument in configuration file||Field in Splunk Web||Description|
||AWS Account||The AWS account or EC2 IAM role the Splunk platform uses to access the keys in your S3 buckets. In Splunk Web, select an account from the drop-down list. In |
Note: If the region of the AWS account you select is GovCloud, you may encounter errors like Failed to load options for S3 Bucket. You need to manually add AWS GovCloud Endpoint in the S3 Host Name field. See http://docs.aws.amazon.com/govcloud-us/latest/UserGuide/using-govcloud-endpoints.html for more information.
||Assume Role||The IAM role to assume, see Manage IAM roles.|
||S3 Bucket||AWS Bucket Name|
||Log File Prefix/S3 Key Prefix||Configure the prefix of the log file. This add-on will search the log files under this prefix. This argument is titled Log File Prefix in incremental S3 field inputs, and is titled S3 Key Prefix in generic S3 field inputs.|
||N/A||Configure partitions of a log file to be ingested. This add-on will search the log files for <Region ID> and <Account ID>. For example, |
||Start Date/Time||The start date of the log.|
||End Date/Time||The end date of the log.|
||Source Type||A source type for the events. Specify only if you want to override the default of |
||Index||The index name where the Splunk platform should to put the S3 data. The default is main.|
||CloudTrail Event Blacklist||Only valid if the source type is set to |
||CloudTrail Event Blacklist||\.bin$).|
||Polling Interval||The number of seconds to wait before the Splunk platform runs the command again. Default is 1800 seconds.|
Configure a Generic S3 input using configuration file
When you configure inputs manually in
inputs.conf, create a stanza using the following template and add it to
$SPLUNK_HOME/etc/apps/Splunk_TA_aws/local/inputs.conf. If the file or path does not exist, create it.
[aws_s3://<name>] is_secure = <whether use secure connection to AWS> host_name = <the host name of the S3 service> aws_account = <AWS account used to connect to AWS> bucket_name = <S3 bucket name> polling_interval = <Polling interval for statistics> key_name = <S3 key prefix> recursion_depth = <For folder keys, -1 == unconstrained> initial_scan_datetime = <Splunk relative time> terminal_scan_datetime = <Only S3 keys which have been modified before this datetime will be considered. Using datetime format: %Y-%m-%dT%H:%M:%S%z (for example, 2011-07-06T21:54:23-0700).> log_partitions = AWSLogs/<Account ID>/CloudTrail/<Region> max_items = <Max trackable items.> max_retries = <Max number of retry attempts to stream incomplete items.> whitelist = <Override regex for blacklist when using a folder key.> blacklist = <Keys to ignore when using a folder key.> character_set = <The encoding used in your S3 files. Default to 'auto' meaning that file encoding will be detected automatically amoung UTF-8, UTF8 without BOM, UTF-16BE, UTF-16LE, UTF32BE and UTF32LE. Notice that once one specified encoding is set, data input will only handle that encoding.> ct_blacklist = <The blacklist to exclude cloudtrail events. Only valid when manually set sourcetype=aws:cloudtrail.> ct_excluded_events_index = <name of index to put excluded events into. default is empty, which discards the events> aws_iam_role = <AWS IAM role to be assumed>
Note: Under one AWS account, to ingest logs in different prefixed locations in the bucket, you need to configure multiple AWS data inputs, one for each prefix name. Alternatively, you can configure one data input but use different AWS accounts to ingest logs in different prefixed locations in the bucket.
Some of these settings have default values that can be found in
[aws_s3] aws_account = sourcetype = aws:s3 initial_scan_datetime = default log_partitions = AWSLogs/<Account ID>/CloudTrail/<Region> max_items = 100000 max_retries = 3 polling_interval= interval = 30 recursion_depth = -1 character_set = auto is_secure = True host_name = s3.amazonaws.com ct_blacklist = ^(?:Describe|List|Get) ct_excluded_events_index =
Configure Description inputs for the Splunk Add-on for AWS
Configure Incremental S3 inputs for the Splunk Add-on for AWS
This documentation applies to the following versions of Splunk® Supported Add-ons: released, released