Splunk® Data Stream Processor

Getting Data In

On April 3, 2023, Splunk Data Stream Processor will reach its end of sale, and will reach its end of life on February 28, 2025. If you are an existing DSP customer, please reach out to your account team for more information.
This documentation does not apply to the most recent version of Splunk® Data Stream Processor. For documentation on the most recent version, go to the latest release.

Use the AWS Metadata connector with Splunk DSP

Use the AWS Metadata connector to pull the metadata from the resources and infrastructure in AWS. The AWS Metadata connector uses AWS regions and AWS APIs to collect resource status and infrastructure information.

Prerequisites

Before you can use the AWS Metadata connector, you must have an AWS account. If you do not have an AWS account, ask your AWS admin to create an account and provide the Access Key ID and Secret Access Key. See Access Keys (Access Key ID and Secret Access Key) in the Amazon Web Services (AWS) documentation for more information about Access Key credentials.

Permissions for AWS Metadata

Make sure your AWS account has the following permissions if you want to collect data from all supported APIs:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeKeyPairs",
                "ec2:DescribeReservedInstances",
                "ec2:DescribeSnapshots",
                "ec2:DescribeVolumes",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeImages",
                "ec2:DescribeAddresses",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeInstanceHealth",
                "ec2:DescribeVpcs",
                "ec2:DescribeSubnets",
                "ec2:DescribeNetworkAcls",
                "cloudfront:ListDistributions",
                "rds:DescribeDBInstances",
                "lambda:ListFunctions",
                "s3:ListAllMyBuckets",
                "iam:GetAccountPasswordPolicy",
                "iam:GetAccessKeyLastUsed",
                "iam:ListUsers",
                "iam:ListAccessKeys"
            ],
            "Resource": "*"
        }
    ]
}

If you want to collect data from a subset of the supported AWS APIs, you only need to add the permissions for the specific AWS APIs you wish to use.

Supported AWS APIs

Refer to the following table for the supported AWS APIs:

AWS API AWS Permission Source Source type Body
ec2_instances ec2:DescribeInstances <region>:ec2:describeInstances aws:ec2:instance All attributes of ec2.Instance
OwnerID of ec2.Reservation
ec2_key_pairs ec2:DescribeKeyPairs <region>:ec2:describeKeyPairs aws:ec2:keyPair All attributes of ec2.KeyPairInfo
ec2_reserved_instances ec2:DescribeReservedInstances <region>:ec2:describeReservedInstances aws:ec2:reservedInstances All attributes of ec2.ReservedInstances
ebs_snapshots ec2:DescribeSnapshots <region>:ec2:describeSnapshots aws:ec2:snapshot All attributes of ec2.Snapshot
ec2_volumes ec2:DescribeVolumes <region>:ec2:describeVolumes aws:ec2:volume All attributes of ec2.Volume
ec2_security_groups ec2:DescribeSecurityGroups <region>:ec2:describeSecurityGroups aws:ec2:securityGroup All attributes of ec2.SecurityGroup
ec2_images ec2:DescribeImages <region>:ec2:describeImages aws:ec2:image All attributes of ec2.Image
ec2_addresses ec2:DescribeAddresses <region>:ec2:describeAddresses aws:ec2:address All attributes of ec2.Address
classic_load_balancers elasticloadbalancing:DescribeLoadBalancers
elasticloadbalancing:DescribeTags
elasticloadbalancing:DescribeInstanceHealth
<region>:elb:describeLoadBalancers aws:elb:loadBalancer All attributes of elb.LoadBalancerDescription
Tags: All attributes of elb.Tags
Instances: All attributes of elb.InstanceState
application_load_balancers elasticloadbalancing:DescribeLoadBalancers
elasticloadbalancing:DescribeListeners
elasticloadbalancing:DescribeTags
elasticloadbalancing:DescribeTargetHealth
elasticloadbalancing:DescribeTargetGroups
<region>:elbv2:describeLoadBalancers aws:elbv2:loadBalancer All attributes of elbv2.LoadBalance LoadBalance
Listeners: All attributes of elbv2.Listeners
Tags: All attributes of elbv2.Tags
TargetGroups: All attributes of elbv2.TargetGroup and elbv2.TargetHealth
vpcs ec2:DescribeVpcs <region>:ec2:describeVpcs aws:ec2:vpc All attributes of ec2.Vpc
vpc_subnets ec2:DescribeSubnets <region>:ec2:describeSubnets aws:ec2:subnet All attributes of ec2.Subnet
vpc_network_acls ec2:DescribeNetworkAcls <region>:ec2:describeNetworkAcls aws:ec2:networkAcl All attributes of ec2.NetworkAcl
cloudfront_distributions cloudfront:ListDistributions <region>:cloudfront:listDistributions aws:cloudfront:distribution All attributes of cloudfront.DistributionSummary
rds_instances rds:DescribeDBInstances <region>:rds:describeDBInstances aws:rds:dbInstance All attributes of rds.DBInstance
lambda_functions lambda:ListFunctions <region>:lambda:listFunctions aws:lambda:function All attributes of lambda.FunctionConfiguration
s3_buckets s3:ListAllMyBuckets <region>:s3:listBuckets aws:s3:bucket All attributes of s3.Bucket
iam_users iam:ListUsers
iam:ListAccessKeys
iam:GetAccessKeyLastUsed
iam:GetAccountPasswordPolicy
<region>:iam:listUsers aws:iam:user All attributes of iam.User
AccessKey: All attributes of iam.AccessKeyMetadata
AccessKey.AccessKeyLastUsed: all attributes of iam.AccessKeyLastUsed
PasswordPolicy: All attributes of iam.PasswordPolicy

Parameters used in the AWS Metadata connector

In addition to the common configuration parameters, the AWS Metadata connector uses the following parameters:

  • aws_credential: The AWS credentials used to access the AWS APIs
    • access_key: The AWS access key credential information
      • aws_access_key_id: Your AWS access key ID
      • aws_secret_access_key: Your AWS secret access key
  • apis: Optional. An array of the APIs used to collect resource and infrastructure data. If apis is not specified, the connector will collect from all supported APIs.
  • regions: An array of regions

All credentials are transmitted securely by HTTPS and saved in the Collect service with industry-standard encryption. They can't be accessed outside of the current tenant.

AWS Metadata connector output

The following attributes are included in the resource and infrastructure data collected from AWS:

  • AccountID: The account ID of the AWS account. This attribute isn't included if the account ID can't be found.
  • Region: The region of the resources.

For most AWS APIs, the connector collects the metadata of the resources in the event.Body.

A typical event from the ec2_key_pairs API looks like this:

{
    "AccountID": "123412341234",
    "Region": "ca-central-1",
    "Body": {
        "KeyFingerprint":"d7:b7:98:45:fe:f6:29:3a:76:8e:15:75:d1:d6:e4:35:69:f2:b8:3e",
        "KeyName":"key name"
    },
    "source": "us-east-1:ec2:describeKeyPairs",
    "sourcetype": "aws:ec2:keyPair",
    "host"": "hostname",
    "index": "main"
    "time": 1568050119
}

A typical event from the ec2_instances API looks like this:

{
    "AccountID": "123412341234",
    "Region": "ca-central-1",
    "Body": {
        "OwnerId": "111353726070", // Attached attribute from ec2.Reservation
        "ImageId": "ami-e031e29a",
        "InstanceId": "i-0dc3bc5dbc43b37e6",
        "InstanceType": "m4.large",
        "KeyName": "key name",
        //... other attributes of ec2.instance
    },
    "source": "us-east-1:ec2:describeInstances",
    "sourcetype": "aws:ec2:instance",
    "host"": "hostname",
    "index": "main"
    "time": 1568050119
}

Create, modify, and delete a scheduled job using the Collect API

You can create, modify, and delete a scheduled job in the AWS Metadata connector using the Collect API.

Create a scheduled job

The following example creates a job, schedules it to run at 45 minutes past every hour, and assigns 2 workers.

curl -X POST "https://api.scp.splunk.com/{tenant}/collect/v1beta1/jobs" \
    -H "Authorization: Bearer <accessToken>" \
    -H "Content-Type: application/json" \
    -d '{
            "name": "your connection name",
            "connectorID": "aws-metadata",
            "schedule": "45 * * * *",
            "parameters": {
                "aws_credential": {
                    "access_key": {
                        "aws_access_key_id": "your AWS access key",
                        "aws_secret_access_key": "your AWS secret key"
                    }
                },
                "regions": [
                    "us-east-1",
                    "us-east-2"
                ],
                "apis": [
                    "ec2_instances",
                    "ec2_key_pairs"
                ]
            },
            "enabled": true,
            "scalePolicy": {
                "static": {
                    "workers": 2
                }
            }
        }'

A typical response when you create a scheduled job using a POST request looks like this:

{
    "data": {
        "connectorID": "aws-metadata",
        "createUserID": "your user name",
        "createdAt": "2019-09-30T22:23:17.548Z",
        "id": "your job ID",
        "lastModifiedAt": "2019-09-30T22:23:17.548Z",
        "lastUpdateUserID": "last user who updated",
        "name": "your connection name",
        "schedule": "45 * * * *",
        "scheduled": true,
        "tenant": "your tenant ID",
        "eventExtraFields": null,
        "parameters": {
            "aws_credential": {},
            "apis": [
                "ec2_instances",
                "ec2_key_pairs"
            ],
            "regions": [
                "us-east-1",
                "us-east-2"
            ]
        },
        "scalePolicy": {
            "static": {
                "workers": 2
            }
        }
    }
}

Verify the job

After you create the scheduled job, you can find the job id in the POST response. The following example performs a GET request on the job id to verify the was created and scheduled correctly:

curl -X GET "https://<DSP_HOST>:31000/default/collect/v1beta1/jobs/<jobId>" \
    -H "Authorization: Bearer <accessToken>" \
    -H "Content-Type: application/json"

A typical response for a GET request on a job id in a scheduled job looks like this:

{
    "data": {
        "connectorID": "aws-metadata",
        "createUserID": "your user name",
        "createdAt": "2019-09-30T22:23:17.548Z",
        "id": "your job ID",
        "lastModifiedAt": "2019-09-30T22:23:17.548Z",
        "lastUpdateUserID": "last user who updated",
        "name": "your connection name",
        "schedule": "45 * * * *",
        "scheduled": true,
        "tenant": "your tenant ID",
        "eventExtraFields": null,
        "parameters": {
            "aws_credential": {},
            "apis": [
                "ec2_instances",
                "ec2_key_pairs"
            ],
            "regions": [
                "us-east-1",
                "us-east-2"
            ]
        },
        "scalePolicy": {
            "static": {
                "workers": 2
            }
        }
    }
}

Modify a scheduled job

The following example modifies the scheduled job with the PATCH request to increase the number of workers to 4:

curl -X PATCH "https://<DSP_HOST>:31000/default/collect/v1beta1/jobs/<jobId>" \
    -H "Authorization: Bearer <accessToken>" \
    -H "Content-Type: application/merge-patch+json" \
    -d '{
            "scalePolicy": {
                "static": {
                    "workers": 4
                }
            }
        }'

A typical response for a PATCH request on a scheduled job looks like this:

{
    "data": {
        "connectorID": "aws-metadata",
        "createUserID": "your user name",
        "createdAt": "2019-09-30T22:23:17.548Z",
        "id": "your job ID",
        "lastModifiedAt": "2019-09-30T22:23:17.548Z",
        "lastUpdateUserID": "last user who updated",
        "name": "your connection name",
        "schedule": "45 * * * *",
        "scheduled": true,
        "tenant": "your tenant ID",
        "eventExtraFields": null,
        "parameters": {
            "aws_credential": {},
            "apis": [
                "ec2_instances",
                "ec2_key_pairs"
            ],
            "regions": [
                "us-east-1",
                "us-east-2"
            ]
        },
        "scalePolicy": {
            "static": {
                "workers": 4
            }
        }
    }
}

Delete a scheduled job

The following example deletes a scheduled job based on job id:

Make sure that no active pipelines are using the scheduled job you want to delete. If you delete a scheduled job with an active pipeline, your pipeline stops processing data.

curl -X DELETE "https://<DSP_HOST>:31000/default/collect/v1beta1/jobs/<jobId>" \
    -H "Authorization: Bearer <accessToken>" \
    -H "Content-Type: application/json"

When the connection is successfully deleted, you receive a "204 No content" response.

Last modified on 30 April, 2020
Use the Amazon S3 connector with Splunk DSP   Use the Azure Monitor Metrics connector with Splunk DSP

This documentation applies to the following versions of Splunk® Data Stream Processor: 1.0.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters