Configure inputs for the Splunk Add-on for AWS
Configure inputs for the Splunk Add-on for AWS.
Input configuration overview
You can use the Splunk Add-on for AWS to collect data from AWS. For each supported data type, one or more input types are provided for data collection.
Follow these steps to plan and perform your AWS input configuration:
Users adding new inputs must have the
admin_all_objects role enabled.
- Click input type to go to the input configuration details.
- Follow the steps described in the input configuration details to complete the configuration.
Supported data types and corresponding AWS input types
The following matrix lists all the data types that can be collected using the Splunk Add-on for AWS and the corresponding input types that you can configure to collect this data.
For some data types, the Splunk Add-on for AWS provides you with the flexibility to choose from multiple input types based on specific requirements. For example, collect historical logs as opposed to only collect newly created logs. SQS-based S3 is the best practice input type to use for all of its collectible data types.
|Data Type||Source type||Supported Input Types||Best practice Input Type|
|CloudFront Access Logs||
||Config Rules||Config Rules|
|ELB Access Logs||
|S3 Access Logs||
|VPC Flow Logs||
|Others||Custom sourcetypes||SQS-based S3
AWS input types
The Splunk Add-on for AWS provides two categories of input types to gather useful data from your AWS environment:
- Dedicated, or single-purpose input types. Designed to ingest one specific data type
- Multi-purpose input types to collect multiple data types from the S3 bucket
Some data types can be ingested using either a dedicated input type or a multi-purpose input type. For example, CloudTrail logs can be collected using any of the following input types: CloudTrail, S3, or SQS-based S3. The SQS-based S3 input type is the recommended option because it is more scalable and provides higher ingestion performance.
Dedicated input types
To ingest a specific type of log, configure the corresponding dedicated input designed to collect the log type. Click the input type name in the following table for instructions on how to configure it.
|AWS Config||Configuration snapshots, historical configuration data, and change notifications from the AWS Config service.|
|Config Rules||Compliance details, compliance summary, and evaluation status of your AWS Config Rules.|
|Inspector||Assessment Runs and Findings data from the Amazon Inspector service.|
|CloudTrail||AWS API call history from the AWS CloudTrail service.|
|CloudWatch Logs||Logs from the CloudWatch Logs service, including VPC Flow Logs. VPC Flow Logs allow you to capture IP traffic flow data for the network interfaces in your resources.|
|CloudWatch||Performance and billing metrics from the AWS CloudWatch service.|
|Description||Metadata about your AWS environment.|
|Billing||Billing data from the billing reports that you collect in the Billing & Cost Management console.|
|Kinesis||Data from your Kinesis streams. |
Note: It is a best practice to collect VPC flow logs and CloudWatch logs through Kinesis streams. However, the AWS Kinesis input has the following limitations:
You can also collect data from Kinesis streams using the Splunk Add-on for Amazon Kinesis Firehose. The Splunk Add-on for Amazon Kinesis Firehose simplifies some of the configuration steps, but the same limitations about collecting data from streams apply. For more information, see About the Splunk Add-on for Amazon Kinesis Firehose.
|SQS||Data from your AWS SQS.|
Multi-purpose input types
Configure multi-purpose inputs to ingest supported log types.
Use the SQS-based input type to collect its supported log types. If you are already collecting logs using generic S3 inputs, you can still create SQS-based inputs and migrate your existing generic S3 inputs to the new inputs. For detailed migration steps, see Migrate from the S3 input to the SQS-based input in this manual.
If the log types you want to collect are not supported by the SQS-based input type, use the generic S3 input type instead.
Read the multi-purpose input types comparison table to view the differences between the multi-purpose S3 collection input types.
Click the input type name in the table below for instructions on how to configure it.
|SQS-based S3 (best practice)||A more scalable and higher-performing alternative to the generic and incremental S3 inputs, the SQS-based S3 input polls messages from SQS that subscribes to SNS notification events from AWS services and collects the corresponding log files - generic log data, CloudTrail API call history, Config logs, and access logs - from your S3 buckets in real time. |
Unlike the other S3 input types, the SQS-based S3 input type takes advantage of the SQS visibility timeout setting and enables you to configure multiple inputs to scale out data collection from the same folder in an S3 bucket without ingesting duplicate data. Also, the SQS-based S3 input automatically switches to multipart, in-parallel transfers when a file is over a specific size threshold, thus preventing timeout errors caused by large file size.
|Generic S3||General-purpose input type that can collect any log type from S3 buckets: CloudTrail API call history, access logs, and even custom non-AWS logs. |
The generic S3 input lists all the objects in the bucket and examines the modified date of each file every time it runs to pull uncollected data from an S3 bucket. When the number of objects in a bucket is large, this can be a very time-consuming process with low throughput.
|Incremental S3||The incremental S3 input type collects four AWS service log types. |
There are four types of logs you can collect using the Incremental S3 input:
The incremental S3 input only lists and retrieves objects that have not been ingested from a bucket by comparing datetime information included in filenames against the checkpoint record, which significantly improves ingestion performance.
Multi-purpose input types comparison table
|Generic S3||Incremental S3||SQS-based S3 (best practice)|
|Supported log types||Any log type, including non-AWS custom logs.||4 AWS services log types: CloudTrail logs, S3 access logs, CloudFront access logs, ELB access logs.||5 AWS services log types (Config logs, CloudTrail logs, S3 access logs, CloudFront access logs, ELB access logs), as well as non-AWS custom logs.|
|Data collection method||Lists all objects in the bucket and compares modified date against the checkpoint.||Directly retrieves AWS log files whose filenames are distinguished by datetime.||Decodes SQS messages and ingests corresponding logs from the S3 bucket.|
|Can ingest historical logs (logs generated in the past)?||Yes||Yes||No|
You can scale out data collection by configuring multiple inputs to ingest logs from the same S3 bucket without creating duplicate data
Each generic S3 input is a single point of failure.
Each incremental S3 input is a single point of failure.
Takes advantage of the SQS visibility timeout setting. Any SQS message not successfully processed in time by the SQS-based S3 input will reappear in the queue and will be retrieved and processed again.
In addition, data collection can be horizontally scaled out so that if one SQS-based S3 input fails, other inputs can still continue to pick up messages from the SQS queue and ingest corresponding data from the S3 bucket.
This documentation applies to the following versions of Splunk® Supported Add-ons: released
Feedback submitted, thanks!