Splunk® Supported Add-ons

Splunk Add-on for AWS

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Configure inputs for the Splunk Add-on for AWS

Configure inputs for the Splunk Add-on for AWS.

Input configuration overview

You can use the Splunk Add-on for AWS to collect data from AWS. For each supported data type, one or more input types are provided for data collection.

Follow these steps to plan and perform your AWS input configuration:

Users adding new inputs must have the admin_all_objects role enabled.

  1. Click input type to go to the input configuration details.
  2. Follow the steps described in the input configuration details to complete the configuration.


Supported data types and corresponding AWS input types

The following matrix lists all the data types that can be collected using the Splunk Add-on for AWS and the corresponding input types that you can configure to collect this data.

For some data types, the Splunk Add-on for AWS provides you with the flexibility to choose from multiple input types based on specific requirements. For example, collect historical logs as opposed to only collect newly created logs. SQS-based S3 is the best practice input type to use for all of its collectible data types.

Data Type Source type Supported Input Types Best practice Input Type
Billing aws:billing Billing Billing
CloudWatch aws:cloudwatch CloudWatch CloudWatch
CloudFront Access Logs aws:cloudfront:accesslogs Generic S3

Incremental S3
SQS-based S3

SQS-based S3
Config aws:config, aws:config:notification SQS-based S3

AWS Config

SQS-based S3
Config Rules aws:config:rule Config Rules Config Rules
Description aws:description Description Description
ELB Access Logs aws:elb:accesslogs SQS-based S3

Generic S3
Incremental S3

SQS-based S3
Inspector aws:inspector Inspector Inspector
CloudTrail aws:cloudtrail SQS-based S3

Generic S3
Incremental S3

SQS-based S3
S3 Access Logs aws:s3:accesslogs SQS-based S3

Generic S3
Incremental S3

SQS-based S3
VPC Flow Logs aws:cloudwatchlogs:vpcflow CloudWatch Logs

Kinesis

Kinesis
SQS aws:sqs SQS SQS
Others Custom sourcetypes SQS-based S3

Generic S3
CloudWatch Logs
Kinesis
SQS

SQS-based S3

AWS input types

The Splunk Add-on for AWS provides two categories of input types to gather useful data from your AWS environment:

  • Dedicated, or single-purpose input types. Designed to ingest one specific data type
  • Multi-purpose input types to collect multiple data types from the S3 bucket

Some data types can be ingested using either a dedicated input type or a multi-purpose input type. For example, CloudTrail logs can be collected using any of the following input types: CloudTrail, S3, or SQS-based S3. The SQS-based S3 input type is the recommended option because it is more scalable and provides higher ingestion performance.

Dedicated input types

To ingest a specific type of log, configure the corresponding dedicated input designed to collect the log type. Click the input type name in the following table for instructions on how to configure it.

Input Description
AWS Config Configuration snapshots, historical configuration data, and change notifications from the AWS Config service.
Config Rules Compliance details, compliance summary, and evaluation status of your AWS Config Rules.
Inspector Assessment Runs and Findings data from the Amazon Inspector service.
CloudTrail AWS API call history from the AWS CloudTrail service.
CloudWatch Logs Logs from the CloudWatch Logs service, including VPC Flow Logs. VPC Flow Logs allow you to capture IP traffic flow data for the network interfaces in your resources.
CloudWatch Performance and billing metrics from the AWS CloudWatch service.
Description Metadata about your AWS environment.
Billing Billing data from the billing reports that you collect in the Billing & Cost Management console.
Kinesis Data from your Kinesis streams.
Note: It is a best practice to collect VPC flow logs and CloudWatch logs through Kinesis streams. However, the AWS Kinesis input has the following limitations:
  • Multiple inputs collecting data from a single stream cause duplicate events in the Splunk platform.
  • Does not support monitoring of dynamic shards repartition, which means when there is a shard split or merge, the add-on cannot automatically discover and collect data in the new shards until it is restarted. After you repartition shards, you must restart your data collection node to collect data from the partitions.

You can also collect data from Kinesis streams using the Splunk Add-on for Amazon Kinesis Firehose. The Splunk Add-on for Amazon Kinesis Firehose simplifies some of the configuration steps, but the same limitations about collecting data from streams apply. For more information, see About the Splunk Add-on for Amazon Kinesis Firehose.

SQS Data from your AWS SQS.

Multi-purpose input types

Configure multi-purpose inputs to ingest supported log types.

Use the SQS-based input type to collect its supported log types. If you are already collecting logs using generic S3 inputs, you can still create SQS-based inputs and migrate your existing generic S3 inputs to the new inputs. For detailed migration steps, see Migrate from the S3 input to the SQS-based input in this manual.

If the log types you want to collect are not supported by the SQS-based input type, use the generic S3 input type instead.

Read the multi-purpose input types comparison table to view the differences between the multi-purpose S3 collection input types.

Click the input type name in the table below for instructions on how to configure it.

Input Description
SQS-based S3 (best practice) A more scalable and higher-performing alternative to the generic and incremental S3 inputs, the SQS-based S3 input polls messages from SQS that subscribes to SNS notification events from AWS services and collects the corresponding log files - generic log data, CloudTrail API call history, Config logs, and access logs - from your S3 buckets in real time.
Unlike the other S3 input types, the SQS-based S3 input type takes advantage of the SQS visibility timeout setting and enables you to configure multiple inputs to scale out data collection from the same folder in an S3 bucket without ingesting duplicate data. Also, the SQS-based S3 input automatically switches to multipart, in-parallel transfers when a file is over a specific size threshold, thus preventing timeout errors caused by large file size.
Generic S3 General-purpose input type that can collect any log type from S3 buckets: CloudTrail API call history, access logs, and even custom non-AWS logs.
The generic S3 input lists all the objects in the bucket and examines the modified date of each file every time it runs to pull uncollected data from an S3 bucket. When the number of objects in a bucket is large, this can be a very time-consuming process with low throughput.
Incremental S3 The incremental S3 input type collects four AWS service log types.
There are four types of logs you can collect using the Incremental S3 input:
  • CloudTrail Logs: The add-on searches for the cloudtrail logs under <bucket_name>/<log_file_prefix>/AWSLogs/<Account ID>/CloudTrail/<Region ID>/<YYYY/MM/DD>/<file_name>.json.gz.
  • ELB Access Logs: The add-on searches the elb access logs under <bucket_name>/<log_file_prefix>/AWSLogs/<Account ID>/elasticloadbalancing/<Region ID>/<YYYY/MM/DD>/<file_name>.log.gz.
  • S3 Access Logs: The add-on searches the S3 access logs under <bucket_name>/<log_file_prefix><YYYY-mm-DD-HH-MM-SS><UniqueString>.
  • CloudFront Access Logs: The add-on searches the cloudfront access logs under <bucket_name>/<log_file_prefix><distributionID><YYYY/MM/DD>.<UniqueID>.gz

The incremental S3 input only lists and retrieves objects that have not been ingested from a bucket by comparing datetime information included in filenames against the checkpoint record, which significantly improves ingestion performance.

Multi-purpose input types comparison table

Generic S3 Incremental S3 SQS-based S3 (best practice)
Supported log types Any log type, including non-AWS custom logs. 4 AWS services log types: CloudTrail logs, S3 access logs, CloudFront access logs, ELB access logs. 5 AWS services log types (Config logs, CloudTrail logs, S3 access logs, CloudFront access logs, ELB access logs), as well as non-AWS custom logs.
Data collection method Lists all objects in the bucket and compares modified date against the checkpoint. Directly retrieves AWS log files whose filenames are distinguished by datetime. Decodes SQS messages and ingests corresponding logs from the S3 bucket.
Ingestion performance Low High High
Can ingest historical logs (logs generated in the past)? Yes Yes No
Scalable? No No Yes
You can scale out data collection by configuring multiple inputs to ingest logs from the same S3 bucket without creating duplicate data
Fault-tolerant? No
Each generic S3 input is a single point of failure.
No
Each incremental S3 input is a single point of failure.
Yes
Takes advantage of the SQS visibility timeout setting. Any SQS message not successfully processed in time by the SQS-based S3 input will reappear in the queue and will be retrieved and processed again.
In addition, data collection can be horizontally scaled out so that if one SQS-based S3 input fails, other inputs can still continue to pick up messages from the SQS queue and ingest corresponding data from the S3 bucket.
Last modified on 25 August, 2020
 

This documentation applies to the following versions of Splunk® Supported Add-ons: released


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters