Splunk Cloud Platform

Use Ingest Processors

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

Ingest Processor is currently released as a preview only and is not officially supported. See Splunk General Terms for more information. For any questions on this preview, please reach out to ingestprocessor@splunk.com.

Send data from Ingest Processor to Amazon S3

To send data from Ingest Processor to an Amazon S3 bucket, you must first add an Amazon S3 destination in the Ingest Processor service. Depending on the environment that your Ingest Processor is installed in, you can configure the destination to use different authentication methods to access your bucket:

  • If any of the instances in your Ingest Processor are not installed on Amazon EC2, then you must authenticate the connection using the access key ID and secret access key of an Identity and Access Management (IAM) user who can access the bucket.
  • If all the instances of your Ingest Processor are installed on Amazon EC2, then you can choose to authenticate the connection using either an IAM role or an access key ID and secret access key.

You can then create a pipeline that uses that destination. When you apply that pipeline to your Ingest Processor, the Ingest Processor starts sending data that it receives to your Amazon S3 bucket. In Amazon S3, the data from your Ingest Processor is identified by an object key name that is constructed using auto-generated values from the Ingest Processor and some of the values that you specify in the destination configuration.

Supported file types and data formats

Ingest Processor supports the following file types and data formats for exporting data to Amazon S3.

  • new-line JSON
  • Parquet (version 2.5.0 or higher)

How the Ingest Processor constructs object key names

When you send data from Ingest Processor to an Amazon S3 bucket, that data is identified using an object key name with the following format: <bucket_name>/<folder_name>/<year>/<month>/<day>/<instance_ID>/<file_prefix>-<UUID>.json

When you create your Amazon S3 destination, you specify the bucket name, folder name, and file prefix to be used in this object key name. The instance ID is taken from the ID of the Ingest Processor instance that handled the data, and the Ingest Processor automatically generates the date partitions and the UUID (universally unique identifier).

For example, if you send data to Amazon S3 on October 31, 2022 using a destinatio that has the following configurations:

  • Bucket name: IngestProcessor
  • Folder name: FromUniversalForwarder
  • File prefix: TestData

Your data in Amazon S3 would be associated with an object key name like IngestProcessor/FromUniversalForwarder/year=2022/month=10/day=31/instanceId=72c3a66d-b4f0-11ed-a3ec-0142ad120001/TestData-3ac12345-3b6f-12ed-78d6-0242ec110002.json.

Prerequisites

Before you can send data to Amazon S3, the following requirements must be met:

  • The Amazon S3 bucket that you want to send data to has Object Lock turned off. For information about the Object Lock feature, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html in the Amazon Simple Storage Service (S3) User Guide.
  • You have the necessary credentials for accessing the bucket:
    • If any of the instances in your Ingest Processor are not installed on Amazon EC2, then you must have the access key ID and secret access key for an IAM user who can access the bucket.
    • If all the instances of your Ingest Processor are installed on EC2 and you want to authenticate the connection to the bucket using your IAM role, then you must grant your EC2 instances access to the bucket. See https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-access-s3-bucket in the AWS Knowledge Center.
  • The IAM user or role that you plan to use has the following identity-based policy for accessing the S3 bucket, where <S3_bucket_ARN> is replaced by the Amazon Resource Name (ARN) of the bucket:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetBucketLocation"
                ],
                "Resource": "<S3_bucket_ARN>/*"
            }
        ]
    }
    

    For more information about managing access to Amazon S3 resources, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-access-control.html in the Amazon Simple Storage Service (S3) User Guide.

Use an IAM role to delegate access to your organization's S3 bucket

Create a role in your AWS account that lets the Splunk software access an S3 bucket.


  1. In your AWS Console, navigate to AWS Identity and Access Management (IAM).
  2. Click the Create role button.
  3. On the Select trusted entity page, in the Trusted entity type section, choose Custom trust policy.
  4. In the Custom trust policy field, copy and paste the following policy string. Replace the ${SplunkProxyRoleARN} and ${TenantName} fields with your organization's ${SplunkProxyRoleARN} and ${TenantName} fields:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "SplunkDataProcessorS3WriteOnlyTrustRelationship",
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Principal": {
                    "AWS": "${SplunkProxyRoleARN}"
                },
                "Condition":{
                    "StringEquals": {
                        "sts:ExternalId": "SplunkDataProcessor-${TenantName}"
                    }
                }
            }
        ]
    }
    
  5. Click Next.
  6. On the Add permissions page, click Next without making any changes.
  7. On the Name, review, and create page,
    1. Name the role SplunkDataProcessor-S3-${demo}
    2. Review the Trust policy to make sure that it matches your deployment's information.
    3. Click Create role
    4. .
  8. On the IAM Roles page, navigate to the SplunkDataProcessor-S3-${demo} role, and open it.
  9. On the SplunkDataProcessor-S3-${demo} page, in the Permissions policies section, select Add permissions, and then Create inline policy.
  10. On the Specify permissions page, in the Policy editor section, select JSON.
  11. Copy and paste the following policy string into the JSON policy editor. Replace the ${BucketName} and ${BucketFolder} fields with your organization's ${BucketName} and ${BucketFolder} fields:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "S3PutObjectPermission",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": [
                    "arn:aws:s3:::${BucketName}/${BucketFolder}/*"
                ]
            },
            {
                "Sid": "S3GetBucketLocationPermission",
                "Effect": "Allow",
                "Action": [
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::${BucketName}"
                ]
            }
        ]
    }
    


    For example:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "S3PutObjectPermission",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": [
                    "arn:aws:s3:::cpr-fs-test/parquet/*"
                ]
            },
            {
                "Sid": "S3GetBucketLocationPermission",
                "Effect": "Allow",
                "Action": [
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::cpr-fs-test"
                ]
            }
        ]
    }
    
  12. Click Next.
  13. On the Review and create page,
    1. In the Policy details section, in the Policy name field, name your policy.
    2. Click Create policy.
  14. On the SplunkDataProcessor-S3-${demo} page, in the Summary section, copy the ARN. This IAM user role ARN is used to complete the connection from your Splunk Cloud deployment to your S3 bucket.


Steps

  1. In the Ingest Processor service, select Destinations.
  2. On the Destinations page, select New destination > Amazon S3.
  3. Provide a name and description for your destination:
    Field Description
    Name A unique name for your destination.
    Description (Optional) A description of your destination.
  4. Specify the object key name that you want to use to identify your data in the Amazon S3 bucket. See How the Ingest Processor constructs object key names for more information.
    Field Description
    Bucket Name The name of the bucket that you want to send your data to.


    Ingest Processors use this bucket name as a prefix in the object key name.

    Folder name (Optional) The name of a folder where you want to store your data in the bucket.


    In the object key name, Ingest Processors include this folder name after the bucket name and before a set of auto-generated timestamp partitions.

    File prefix (Optional) The file name that you want to use to identify your data.


    In the object key name, Ingest Processors include this file prefix after the auto-generated timestamp partitions and before an auto-generated UUID value.

    Output data formats *JSON (Splunk HTTP Event Collector schema)


    This setting causes your data to be stored as .json files in the Amazon S3 bucket. The contents of these .json files are formatted into the event schema that's supported by the Splunk HTTP Event Collector. See Event metadata in the Splunk Cloud Platform Getting Data In manual.

    • Parquet format


    This setting causes your data to be stored in the Parquet file format in the Amazon S3 bucket.

  5. Specify the AWS region and authentication method to allow this destination to connect with your Amazon S3 bucket.
    Field Description
    Region The AWS region that your bucket is associated with.
    Authentication The method for authenticating the connection between your Ingest Processor and your Amazon S3 bucket.


    If all of your Ingest Processor instances are installed on Amazon EC2, then select Authenticate using IAM role for Amazon EC2. Otherwise, select Authenticate using access key ID and secret access key.

    AWS access key ID The access key ID for your IAM user.


    This field is available only when Authentication is set to Authenticate using access key ID and secret access key.

    AWS secret access key The secret access key for your IAM user.


    This field is available only when Authentication is set to Authenticate using access key ID and secret access key.

  6. (Optional) To adjust the maximum number of records that this destination sends in each batch of output data, expand Advanced settings and enter your desired maximum number of records in the Batch size field.

    In most cases, the default Batch size value is sufficient. Be aware that the actual size of each batch can vary depending on the rate at which the Ingest Processor is sending out data.

  7. To finish adding the destination, select Add.

You now have a destination that you can use to send data from Ingest Processor to an Amazon S3 bucket.

To start sending data from Ingest Processor to the Amazon S3 bucket specified in the destination, create a pipeline that uses the destination you just added and then apply that pipeline to your Ingest Processor.

Last modified on 28 March, 2024
PREVIOUS
Send data from Ingest Processor to your Splunk Observability Cloud deployment
  NEXT
Verify your Ingest Processor and pipeline configurations

This documentation applies to the following versions of Splunk Cloud Platform: 9.1.2308 (latest FedRAMP release), 9.1.2312


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters