Configure Incremental S3 inputs for the Splunk Add-on for AWS
From version 4.3.0 onwards, the Splunk Add-on for AWS provides the SQS-based S3 input, which is a more scalable and higher-performing alternative to the generic S3 and incremental S3 input types for collecting various types of log files from S3 buckets. For new inputs for collecting an variety of pre-defined and custom data types, consider using the SQS-based S3 input instead.
The incremental S3 input only lists and retrieves objects that have not been ingested from a bucket by comparing datetime information included in filenames against checkpoint record, which significantly improves ingestion performance.
Configure an Incremental S3 input on the data collection node using one of the following ways:
- Configure an Incremental S3 input using Splunk Web (recommended)
- Configure an Incremental S3 input using configuration file
Configure an Incremental S3 input using Splunk Web
To configure inputs in Splunk Web, click Splunk Add-on for AWS in the left navigation bar on Splunk Web home, then choose one of the following menu paths depending on which data type you want to collect:
- Create New Input > CloudTrail > Incremental S3
- Create New Input > CloudFront Access Log > Incremental S3
- Create New Input > ELB Access Logs > Incremental S3
- Create New Input > S3 Access Logs > Incremental S3
Make sure you choose the right menu path corresponding to the data type you want to collect. The system will automatically set the appropriate sourcetype and may display slightly different field settings in the subsequent configuration page based on the menu path.
|Argument in configuration file||Field in Splunk Web||Description|
||AWS Account||The AWS account or EC2 IAM role the Splunk platform uses to access the keys in your S3 buckets. In Splunk Web, select an account from the drop-down list. In |
Note: If the region of the AWS account you select is GovCloud, you may encounter errors like Failed to load options for S3 Bucket. You need to manually add AWS GovCloud Endpoint in the S3 Host Name field. See http://docs.aws.amazon.com/govcloud-us/latest/UserGuide/using-govcloud-endpoints.html for more information.
||Assume Role||The IAM role to assume, see Manage IAM roles.|
||S3 Bucket||AWS Bucket Name|
||Log File Prefix||Configure the prefix of the log file, which along with other path elements, forms the URL under which the Splunk Add-on for AWS searches the log files. |
The locations of the log files are different for each S3 incremental log type:
Note: Under one AWS account, to ingest logs in different prefixed locations in the bucket, you need to configure multiple AWS data inputs, one for each prefix name. Alternatively, you can configure one data input but use different AWS accounts to ingest logs in different prefixed locations in the bucket.
||Log Type||The type of logs to ingestion. Available log types are |
||Log Start Date||The start date of the log.|
||Distribution ID||CloudFront distribution ID. This field is displayed only when you accessed the input configuration page through the Create New Input > CloudFront Access Log > Incremental S3 menu path.|
||Source Type||Source type for the events. This value is automatically set for the type of logs you want to collect based on the menu path you chose to access this configuration page.|
||Index||The index name where the Splunk platform should to put the S3 data. The default is main.|
||Interval||The number of seconds to wait before splunkd checks the health of the modular input so that it can trigger a restart if the input crashed. Default is 30 seconds.
Configure an Incremental S3 input using configuration file
When you configure inputs manually in
inputs.conf, create a stanza using the following template and add it to
$SPLUNK_HOME/etc/apps/Splunk_TA_aws/local/inputs.conf. If the file or path does not exist, create it.
[splunk_ta_aws_logs://<name>] log_type = aws_account = host_name = bucket_name = bucket_region = log_file_prefix = log_start_date = log_name_format = aws_iam_role = AWS IAM role that to be assumed. max_retries = @integer:[-1, 1000]. default is -1. -1 means retry until success. max_fails = @integer: [0, 10000]. default is 10000. Stop discovering new keys if the number of failed files exceeded the max_fails. max_number_of_process = @integer:[1, 64]. default is 2. max_number_of_thread = @integer:[1, 64]. default is 4.
Configure Generic S3 inputs for the Splunk Add-on for AWS
Configure SQS-based S3 inputs for the Splunk Add-on for AWS
This documentation applies to the following versions of Splunk® Supported Add-ons: released, released