Data Manager

User Manual

This documentation does not apply to the most recent version of Data Manager. For documentation on the most recent version, go to the latest release.

Prerequisites for onboarding AWS data sources

An AWS admin completes prerequisites ahead of time so that a Splunk Admin can use Data Manager for onboarding. Alternatively, an AWS admin can complete the entire process. Data Manager contains optional steps to guide you through this choice.

Splunk platform requirements

HTTP Event Collector requirements

If an AWS input has one or more data sources that will require Amazon Kinesis Data Firehose to send data to your Splunk Cloud deployment. Before you deploy the CloudFormation template, ensure that your Splunk Cloud deployment has a load balancer with HTTP Event Collector (HEC) acknowledgement enabled. If you are not sure, check with your Splunk administrator, or reach out to Splunk Support.

For more information on which data sources that require Amazon Kinesis Data Firehose, see the Data ingestion mechanisms and intervals in Data Manager topic in this manual.

Add-on compatibility requirements

The Splunk Add-on for Amazon Web Services, and the Splunk Add-on for Amazon Kinesis Firehose should not be configured for the same AWS account and same data sources as Data Manager.

Common Information Model prerequisites

Data Manager supports Common Information Model (CIM) normalization for Amazon Web Services inputs when the Splunk Add-on for Amazon Web Services (AWS) is installed on the part of your Splunk Cloud deployment that performs the parsing or search-time functionality for your data. This add-on must be installed, but does not need to be configured.

Download the Splunk Add-on for Amazon Web Services (AWS) from Splunkbase

For more information, see the Splunk Add-on for Amazon Web Services documentation manual.

For information on the CIM, see the Overview of the Splunk Common Information Model topic in the Common Information Model Add-on manual.

AWS Kinesis data source prerequisites

Some AWS Kinesis data sources only need to be selected during onboarding, but others need to be configured ahead of time.

Configure CloudTrail

If you use CloudTrail as a data source, make sure that your AWS CloudTrail is configured to send its data to a CloudWatch log group for the accounts and regions that you select. CloudTrail log data will not be ingested when a trail is an organization trail coming from a management account. For more information, see Sending Events to CloudWatch Logs.

Configure IAM Access Analyzer

If you select IAM Access Analyzer, it needs to be enabled in every region where you want to monitor access to your resources. See Enabling Access Analyzer.

Configure Security Hub or GuardDuty

If you select Security Hub or GuardDuty, you need to make sure that your AWS Security Hub or GuardDuty is enabled for the accounts and regions that you select. See Enabling Security Hub and Enable Amazon GuardDuty.

AWS CloudWatch data source prerequisites

Some AWS CloudWatch data sources only need to be selected during onboarding, but others need to be configured ahead of time.

Configure Amazon API Gateway

If you use the Amazon API Gateway as a data source, use the API Gateway console to send Amazon API Gateway logs to your CloudWatch log group for the accounts and regions that you select. See Setting up CloudWatch logging for a REST API in API Gateway.

Configure Amazon DocumentDB

If you use Amazon DocumentDB as a data source, you must both enable both audit logging on your cluster, and Amazon DocumentDB, in order to export logs to your CloudWatch log group for the accounts and regions that you select. See Monitoring Amazon DocumentDB with CloudWatch.

Configure Amazon Elastic Kubernetes Service (EKS)

If you use the Amazon Elastic Kubernetes Service (EKS) as a data source, make sure that each EKS cluster is configured to send its data to an Amazon CloudWatch log group for the accounts and regions that you select. See Amazon EKS control plane logging.

Configure Amazon Relational Database Service (RDS)

If you use the Amazon Relational Database Service (RDS) as a data source, make sure that your RDS instance is configured to send its data to an Amazon CloudWatch log group for the accounts and regions that you select. See Publishing PostgreSQL logs to Amazon CloudWatch Logs.

Configure custom logs

If you use any other AWS services or custom log sources that can publish logs to CloudWatch log groups, either use the AWS Console, AWS CLI, or AWS SDK to send logs to your CloudWatch log group for the accounts and regions that you select. See Enabling logging from certain AWS services

AWS S3 data source prerequisites

Before you can create the Amazon S3 connection, you must have the following:

  • Notifications set up to be sent to SQS whenever new events are written to your Amazon S3 bucket. For more information, see the Setting up SQS notifications section on this page for more information.
  • An IAM user with at least read and write permissions for the SQS queue, as well as read permissions for your Amazon S3 bucket. Permissions for decrypting KMS-encrypted files might also be required. See the Setting up an IAM user section on this page for more information.
  • The access key ID and secret access key for the IAM user.
  • An understanding of the types of files that you plan to get data from. If you plan to get data from Parquet files that are being added to your Amazon S3 bucket, then you will need to set the File Type parameter in the connection to Parquet.
  • The Data Manager S3 data input can ingest CloudTrail data, but not digest or insight logs.

Setting up SQS notifications

If SQS notifications aren't set up, ask your Amazon Web Services (AWS) administrator to configure Amazon S3 to notify SQS whenever new events are written to the Amazon S3 bucket, or couple the Amazon Simple Notification Service (SNS) to SQS to send the notifications.

Setting up an IAM user

If you don't have an IAM user, ask your AWS administrator to create it and provide the associated access key ID and secret access key.

Make sure your IAM user has at least read and write permissions for the queue, as well as read permissions for the related Amazon S3 bucket. See the following list of permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sqs:GetQueueUrl",
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
        "sqs:GetQueueAttributes",
        "sqs:ListQueues",
        "s3:GetObject"
      ],
      "Resource": "*"
    }
  ]
}

AWS CLI Prerequisites

You need AWS CLI version 2 to run the commands, such as the following:

$ aws --version
aws-cli/2.0.4 Python/3.8.2 Darwin/19.6.0 botocore/2.0.0dev8

The aws2 dev version is not supported.

There are numerous ways to prepare your terminal to use the credentials for your data account. Use the AWS documentation for details about configuring your CLI terminal with credentials to run AWS commands. See Configuring the AWS CLI.

Last modified on 16 February, 2024
Data ingestion mechanisms and intervals in Data Manager   Onboard AWS in Data Manager

This documentation applies to the following versions of Data Manager: 1.9.0


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters