Define an Amazon S3 federated provider
To set up federated search for Amazon S3 on your Splunk Cloud Platform deployment, you must define one or more federated providers for that deployment. A federated provider definition gives your Splunk Cloud Platform deployment the means to establish a connection with a specific AWS account and search over specific datasets in that AWS account.
In this task you do these things:
- Name your Amazon S3 federated provider definition and provide the account ID for the AWS account that has the Amazon S3 data that you want to search.
- Identify whether you have CloudTrail datasets in your Amazon S3 account that you want Splunk software to create AWS Glue tables for. This decision affects how you will fill out the rest of the Amazon S3 federated provider definition.
- You can have a mix of Splunk-managed and customer-created AWS Glue tables.
- Provide Amazon S3 locations for the datasets you intend to search.
- If you use server-side encryption through the AWS Key Management Service to encrypt your Amazon S3 buckets, provide the AWS KMS key Amazon resource names (ARNs) for encrypted buckets that contain data you want to search.
- If you have manually created AWS Glue tables for Amazon S3 locations that you intend to search, list the Glue tables and supply the name of the AWS Glue dataset to which the Glue tables belong.
- Generate resource policies for your Amazon S3 locations, your AWS Glue Data Catalog, and your AWS KMS keys, and add those policies to the Amazon account that your federated provider connects to.
General prerequisites
- You must have the following things:
- A role on your Splunk Cloud Platform deployment with the admin_all_objects capability.
- An AWS account and an AWS IAM role with permissions that let you attach and modify policies for Amazon S3 locations and an Amazon Glue data catalog. Contact your AWS administrator for assistance with permissions.
- Turn on token authentication for your Splunk Cloud Platform deployment. See Enable or disable token authentication in Securing Splunk Cloud Platform.
Splunk-managed AWS Glue table prerequisite
If you want to let Splunk software create AWS Glue tables for the AWS CloudTrail datasets in your Amazon S3 buckets, gather the Amazon S3 dataset location paths for each dataset. For more information about obtaining Amazon S3 dataset location paths, see Identify the Amazon S3 data that you want to search.
Customer-created AWS Glue table prerequisites
If you are adding your own customer-created AWS Glue tables to the federated provider definition, gather information for each customer-created AWS Glue table that you are associating with the Amazon S3 federated provider definition. See Create an AWS Glue Data Catalog table.
If you plan to search only AWS CloudTrail datasets in your Amazon S3 buckets, and you will have Splunk software create and manage the AWS Glue tables that facilitate federated searches of those CloudTrail datasets, you can ignore the following prerequisites.
- Obtain the values of the Name, Database, and Location fields for each customer-created AWS Glue table you are adding to the definition. To find this information for a specific table, open the AWS Glue console, and select Tables.
- Each customer-created AWS Glue table you add to a specific Amazon S3 federated provider definition must do the following things:
- Share the same AWS Glue database.
- Reference an Amazon S3 location path that contains file objects that share the same file type and compression type.
- Have a column for each field in the Amazon S3 dataset contained by the Amazon S3 location path.
Steps
- On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federation.
- On the Federated Providers tab, select Add federated provider.
- Select the Amazon S3 federated provider type and select Next.
- Specify Federated provider name for the federated provider you are defining and the AWS account ID for the AWS account that has Amazon S3 buckets that you want to search. Both of these settings are required.
Splunk software provides the AWS region for your Splunk Cloud Platform deployment. Your Glue Data Catalog resources for this provider, such as the AWS Glue database and the AWS Glue tables, must reside in this AWS region.
- Indicate whether you have AWS CloudTrail datasets that you want Splunk software to create AWS Glue tables for. If you have CloudTrail datasets, but you want to create and manage their AWS Glue tables yourself, do not select the checkbox.
- Specify the following settings for your Amazon S3 federated provider:
- AWS Glue database (required if you have customer-created Glue tables)
- AWS Glue tables (required if you have customer-created Glue tables)
- Amazon S3 locations (required in all conditions)
- AWS KMS key ARNs (required if your AWS Glue Data Catalog or S3 buckets use SSE-KMS encryption)
- Select Generate policy to create the following things:
- A Glue Data Catalog resource policy, if you have customer-created AWS Glue tables.
- A separate Amazon S3 bucket policy for each Amazon S3 bucket discovered in the location paths listed in Amazon S3 locations.
- An AWS KMS key policy, if you filled out the AWS KMS key ARNs field.
- If Splunk software generates a Glue Data Catalog resource policy, copy and paste the policy into your AWS Glue account. See Update your Glue Data Catalog resource policy.
- Copy and paste each Amazon S3 bucket policy into the bucket policies for the affected Amazon S3 buckets. See Update your Amazon S3 bucket policies.
- If Splunk software generates an AWS KMS key policy, copy and paste the policy into the key policies for the AWS KMS keys listed in AWS KMS Key ARNs. See Update your AWS KMS Key policies.
- Select Warning and consent and Confirmation that Requester Pays is turned off.
- Select Create provider to create your federated provider definition.
Splunk software verifies whether your deployment has sufficient cross-account permissions to search Amazon S3 accounts.
- If your permissions are sufficient, you receive a success message and can go on to create the federated indexes you use in your federated searches.
- If you are defining a federated index that maps to an AWS Glue table that you have created, see Map a federated index to a customer-created AWS Glue table dataset.
- If you are defining a federated index that maps to a AWS Glue table that Splunk software will create, see Map a federated index to a Splunk-managed AWS Glue table dataset
- If your permissions are insufficient, an error message appears with an Update Amazon S3 permissions button. You can attempt to set permissions yourself by selecting Update Amazon S3 permissions. See Manually update deployment permissions.
Shorten location paths to capture AWS Glue tables
You can shorten location paths to represent multiple AWS Glue tables. For example, you can have 2 AWS Glue tables at the following location paths:
Location | AWS Glue table name |
---|---|
s3://bucket1/path1/my_csv_data
|
table_csv |
s3://bucket1/path1/my_json_data
|
table_json |
You can provide each path separately in the Amazon S3 locations list.
Alternatively, you can enter a single shortened location to the Amazon S3 locations list that captures both table_csv and table_json:
s3://bucket1/path1/
You can also use a wildcard ( * ) to capture both AWS Glue tables:
s3://bucket1/path1/my*
Use wildcards only at the end of Amazon S3 locations.
Amazon S3 paths that terminate in a file object are not valid for Splunk federated providers. Use only locations that end in an Amazon S3 folder.
For example, this is an invalid location path: s3//bucket1/path1/my_csv_data/data.csv
Specify Amazon S3 provider settings
Specifying Amazon S3 provider settings is part of the Define an Amazon S3 federated provider task.
The following table explains the conditions under which each of the Amazon S3 provider settings are required.
Setting name | Requirement condition |
---|---|
Federated provider name | Required in all conditions. |
AWS account ID | Required in all conditions. |
AWS Glue database | Required only if you have manually created AWS Glue tables for the datasets that you intend to search. Do not provide an AWS Glue database value if all of the Amazon S3 datasets you intend to search are composed of AWS CloudTrail data, and you want Splunk software to create and manage the AWS Glue tables for those datasets. |
AWS Glue tables | Required only if you have created AWS Glue tables for Amazon S3 datasets that you intend to search. Do not provide values for AWS Glue tables if all of the Amazon S3 datasets you intend to search are composed of AWS CloudTrail data, and you want Splunk software to create and manage the AWS Glue tables for those datasets. |
Amazon S3 locations | Required in all conditions. |
AWS KMS key ARNs | Required only if the Amazon S3 buckets that contain the data that you want to search have server-side encryption through the AWS Key Management Service (SSE-KMS encryption). |
Federated provider name
Enter a unique name for the federated provider. The provider name can contain only alphanumeric characters, underscores, and hyphens.
AWS account ID
Enter the 12-digit ID for the AWS account that contains the Amazon S3 datasets that you want to search with this federated provider.
AWS region
The Splunk Cloud Platform attempts to automatically identify the AWS region of your deployment. If the Splunk Cloud Platform is successful, it populates the AWS region setting with the appropriate value.
The AWS region of the AWS Glue database for this federated provider must match the AWS region for your Splunk Cloud Platform deployment.
You cannot set this field on your own.
AWS Glue database
Enter the name of the AWS Glue database that contains the AWS Glue tables listed in AWS Glue tables. AWS Glue database names can contain only lowercase letters, numbers, underscores, and hyphens.
The AWS Glue database setting has the following restrictions:
- The AWS Glue database specified by AWS Glue database must contain all of the AWS Glue tables listed in AWS Glue tables.
- The AWS Glue database must have the same AWS region as your Splunk Cloud Platform deployment.
- A federated provider definition can have only 1 AWS Glue database name.
Check the Tables page in the AWS Glue console to see the Database assignments for individual AWS Glue tables.
Do not provide an AWS Glue database value if all of your AWS Glue tables will be Splunk-managed.
AWS Glue tables
Enter 1 or more customer-created AWS Glue tables that you want to associate with this federated provider. Separate AWS Glue table names with commas. AWS Glue table names can contain only lowercase letters, numbers, underscores, and hyphens.
All AWS Glue tables listed in AWS Glue tables must have these elements:
- Belong to the AWS Glue database specified by AWS Glue database.
- Reference an Amazon S3 location path listed in Amazon S3 locations.
To get AWS Glue table names, check the Tables page in the AWS Glue console. AWS Glue table names appear in the Names column.
For more information about creating AWS Glue tables, see Create an AWS Glue table.
Do not provide AWS Glue tables values if all of your AWS Glue tables will be Splunk-managed.
Amazon S3 locations
Amazon S3 locations are file paths in Amazon S3 that contain the data that you want to search.
Enter 1 or more Amazon S3 location paths. Separate location paths with commas. Amazon S3 locations can contain only alphanumeric characters and the following special characters: /!=_.*'():
You can shorten locations so that they encompass multiple paths. See Shorten location paths to capture AWS Glue tables.
Provide location paths for the following things:
- Each customer-created AWS Glue table you list in AWS Glue tables. Each AWS Glue table can be associated with only 1 location path. A single location path can be specified in multiple unrelated AWS Glue table definitions.
- If you do not know the Amazon S3 location paths for the AWS Glue tables you have listed, check the Tables page in the AWS Glue console. Amazon S3 location paths appear in the Locations column.
- Each AWS CloudTrail dataset in your Amazon S3 buckets for which you want Splunk software to create an AWS Glue table.
AWS KMS key ARNs
Do you use the AWS Key Management Service to apply server-side encryption (SSE-KMS) to the data stored in your Amazon S3 buckets or the metadata in your AWS Glue Data Catalog?
If you do, enter into AWS KMS key ARNs the Amazon resource names (ARNs) for the AWS KMS keys that encrypt data in your Amazon S3 buckets or metadata in your AWS Glue Data Catalog.
Federated search for Amazon S3 supports only customer-managed AWS KMS keys. In addition, each KMS key ARN you provide in this field must belong to the AWS account you specify with the AWS account ID setting.
For more information about AWS KMS keys, see AWS KMS concepts in the AWS Key Management Service Developer Guide.
Get AWS KMS key ARNs for your Amazon S3 buckets
To get the AWS KMS key ARNs for your Amazon S3 buckets, go to your Amazon S3 console and review the buckets that are associated with the locations you have listed in Amazon S3 locations for this provider. Follow these steps to obtain an AWS KMS key ARN that is associated with an Amazon S3 bucket.:
- In the Amazon S3 console, select the bucket name to open its record.
- Select the bucket Properties tab.
- Inspect the Default encryption section. If the Encryption type is Server-side encryption with AWS Key Management Service keys (SSE-KMS), copy the Encryption key ARN that appears below it. Select the copy icon () next to the ARN to ensure an accurate copy and paste operation.
- Paste the copied ARN into the AWS KMS Key ARNs field in your federated provider definition.
If you copy multiple ARNs into AWS KMS key ARNs, separate them with comma characters.
If additional AWS KMS keys encrypt data within the bucket, you must provide their key ARNs within the AWS KMS key ARNs list.
Get the AWS KMS key ARN for your AWS Glue Data Catalog
Follow these steps to get the AWS KMS key ARN for your AWS Glue Data Catalog:
- In the AWS Glue console, select Data Catalog and then Catalog settings in the left-hand navigation bar.
- Under Encryption options, if Metadata encryption is selected and you find an AWS KMS key ARN in the AWS KMS key for metadata encryption field, copy it. Select the copy icon () to ensure an accurate copy and paste operation.
- Paste the copied ARN into the AWS KMS key ARNs field in your federated provider definition.
If AWS KMS key ARNs contains a list of other ARNs when you add the Data Catalog AWS KMS key ARN, make sure you separate the new ARN from the others with a comma character.
Update your Glue Data Catalog resource policy
Updating your Glue Data Catalog resource policy is part of the Define an Amazon S3 federated provider task.
If you have customer-created AWS Glue tables and have therefore identified an AWS Glue database and AWS Glue tables for the federated provider, when you select Generate policy, Splunk software generates a Glue Data Catalog resource policy. Copy and paste this Glue Data Catalog resource policy into your AWS Glue account, appending it to other Glue Data Catalog resource policies if any exist.
- Select Copy for the Glue Data Catalog resource policy to save a copy of the policy to your clipboard.
Here is an example of a Glue Data Catalog resource policy:{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowSplunkAccessToAWSGlueDataCatalog", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<AWS-account-ID>:role/<stack-name>" ] }, "Action": [ "glue:GetDatabase", "glue:GetDatabases", "glue:GetTables", "glue:GetTable", "glue:GetPartitions", "glue:GetPartition", "glue:BatchGetPartition" ], "Resource": [ "arn:aws:glue:us-west-2:<AWS-account-ID>:catalog", "arn:aws:glue:us-west-2:<AWS-account-ID>:database/a_sample_db", "arn:aws:glue:us-west-2:<AWS-account-ID>:table/a_sample_db/a_table" ] } ] }
Each Splunk Cloud Platform deployment is identified by its
stack-name
, which is the prefix of the deployment's URL. For example, if your deployment's URL is https://buttercupgames.splunkcloud.com, thestack-name
isbuttercupgames
. - In the AWS Glue console, in the left-hand sidebar, select Data Catalog and then select Catalog settings.
- Paste the saved Glue Data Catalog resource policy into the Permissions field. If other Glue data catalog resource policies already exist, append your new policy to the end of the list. Separate the policies with commas.
Resolve security warnings, errors, general warnings, and suggestions before you save your policy.
- Select Save to save the Glue Data Catalog resource policy update.
For more information about updating Glue Data Catalog resource policies, see Granting cross-account access in the AWS Glue User Guide.
Update your Amazon S3 bucket policies
Updating your Amazon S3 bucket policies is part of the Define an Amazon S3 federated provider task.
When you select Generate policy, Splunk software generates an Amazon S3 bucket policy for each Amazon S3 bucket referenced in Amazon S3 locations for the federated provider. Update your Amazon S3 account with these generated policies.
- Select Copy for the Amazon S3 bucket policy to save a copy of the policy to your clipboard.
Here is an example of an Amazon S3 bucket policy:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<AWS-account-ID>:role/<stack-name>" ] }, "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:GetObject*" ], "Sid": "AllowSplunkAccessTo a-sample-aws-s3-bucket", "Resource": [ "arn:aws:s3:::a-sample-aws-s3-bucket", "arn:aws:s3:::a-sample-aws-s3-bucket/a-table/*"] } ] }
Each Splunk Cloud Platform deployment is identified by its
stack-name
, which is the prefix of the deployment's URL. For example, if your deployment's URL is https://buttercupgames.splunkcloud.com, thestack-name
isbuttercupgames
. - After you copy the policy, go to the Amazon S3 console and navigate to the Buckets page.
- Select the Name of the bucket for the policy you have copied.
- Select the Permissions tab for the bucket.
- Select Edit for the Bucket policy.
- Select Add new statement.
- Paste the saved bucket policy into the bucket policy edit window. If there are already policies there, append your bucket policy to the existing set of policies, separating policies with commas.
Resolve security warnings, errors, general warnings, and suggestions before you save your policy.
- Select Save changes to save your policy update.
Repeat these steps for each Amazon S3 bucket policy that Splunk software generates for your federated provider.
For more information about updating Amazon S3 bucket policies, search on "Using bucket policies" in the Amazon Simple Storage Service User Guide.
Update your AWS KMS Key policies
Updating your AWS KMS Key Policies is part of the Define an Amazon S3 federated provider task.
If you are using SSE-KMS encryption to encrypt data in your Amazon S3 buckets or your AWS Glue Data Catalog and you have filled out the AWS KMS key ARNs field for the federated provider, when you select Generate Policy, Splunk software generates an AWS KMS key policy. To allow Splunk to search your SSE-KMS-encrypted Amazon S3 data, copy and paste this AWS KMS key policy into the accounts for your AWS KMS keys, appending it to other AWS KMS key policies if any exist.
- Start by selecting Copy for the AWS KMS key policy to save a copy of the policy to your clipboard.
Here is an example of a AWS KMS key policy:{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowUseOfTheKey", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<AWS-account-ID>:role/<stack-name>" ] }, "Action": [ "kms:Decrypt" ], "Resource": "*" } ] }
Each Splunk Cloud Platform deployment is identified by its
stack-name
, which is the prefix of the deployment's URL. For example, if your deployment's URL is https://buttercupgames.splunkcloud.com, thestack-name
isbuttercupgames
. - In the Amazon S3 console, navigate to the Buckets page.
- Select the Name of a bucket that corresponds to a listed AWS KMS key ARN.
- Select the Properties tab for the bucket.
- Select the Encryption key ARN in the Default encryption section to open the Key ID page for the key in the Key Management Service.
- In the Key policy section, select Edit.
- Paste your saved AWS KMS key policy to the existing Key policy, separating policies with commas. When you append the policy, do not copy in the Version and Statement header fields if they already exist in the policy.
Resolve security warnings, errors, general warnings, and suggestions before you save your policy.
- Select Save changes to save the key policy update.
Repeat these steps for each AWS KMS key ARN listed in AWS KMS key ARNs for your federated provider. For more information, see Changing a key policy in the AWS Key Management Service Developer Guide.
Manually update deployment permissions
This step is part of the Define an Amazon S3 federated provider task. Take this step only when you get an error message that asks you to manually update your deployment permissions.
Before you can run federated searches over an Amazon S3 account from your Splunk Cloud Platform deployment, Splunk software needs to set up cross-account permissions for your deployment. Without these permissions, you cannot run Amazon S3 federated searches.
Splunk software verifies whether your deployment has correct cross-account permissions whenever you attempt to create or update a federated provider. If it detects that your deployment has missing or incorrect cross-account permissions, it attempts to set them up. If that attempt fails, you can try to manually set the cross-account permissions by selecting the Update Amazon S3 permissions button.
There are two ways to access the Update Amazon S3 permissions button.
- When you create or update a federated provider, Splunk software displays an error message with the Update Amazon S3 permissions button if its attempt to automatically set up cross-account permissions fails. Select the button to reattempt to set the permissions.
- At any time you can access the button from the Federated Providers tab, which you can get to by selecting Settings, then Federated Search. Select Update Amazon S3 permissions to open a dialog box that contains the Update Amazon S3 permissions button.
If you select Update Amazon S3 permissions and your permissions are not restored, contact your Splunk Support representative.
Create an AWS Glue table | Map an Amazon S3 federated index to a customer-created AWS Glue table dataset |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408
Feedback submitted, thanks!