Map a federated index to a Splunk-managed AWS Glue table dataset
Follow this topic to create a federated index that maps to a Splunk-managed AWS Glue table dataset. If you want to define a federated index that maps to a customer-managed AWS Glue table dataset, see Map a federated index to a customer-created AWS Glue table dataset.
After you create an Amazon S3 federated provider for your Splunk Cloud Platform deployment, you create federated indexes for use in federated searches. Each federated index you create maps to a specific AWS Glue table, which in turn references an Amazon S3 dataset. You invoke federated indexes in your federated searches to tell Splunk software which Amazon S3 dataset you intend to search.
The Splunk platform creates federated indexes on the search head of your Splunk Cloud Platform deployment.
You can create federated indexes for two kinds of AWS Glue tables: customer-created AWS Glue tables and Splunk-managed AWS Glue tables. This task guides you through the process of creating a federated index that maps to an AWS Glue table, which Splunk software creates and manages behind the scenes.
In this task, you do these things:
- Provide the name of the federated index.
- Select an Amazon S3 federated provider that is configured with Amazon S3 locations that point to AWS CloudTrail datasets.
- Select the AWS Glue table (Splunk managed) dataset type.
- Supply an Amazon S3 location path that points to the AWS CloudTrail dataset to which this federated index will be mapped.
- Provide the maximum relative time range for searches of the dataset.
- List the AWS Account ID and AWS Region values that can be used as partition keys for your searches of the AWS CloudTrail dataset.
You can map a federated index to only one AWS CloudTrail dataset at a time. If a federated provider has Amazon S3 locations for several AWS CloudTrail datasets over which you want to run federated searches, define a separate federated index for each AWS CloudTrail dataset.
Prerequisites
- A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
- Datasets in your Amazon S3 buckets that are composed entirely of AWS CloudTrail data.
- You must have already defined an Amazon S3 federated provider that is set up for the creation of Splunk-managed AWS Glue tables. See Define an Amazon S3 federated provider.
Steps
- On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federated search.
- On the Federated index tab, select Add federated index.
You might also come to the Add federated index screen directly after creating a federated provider.
- Using the following table, specify the settings for your federated index.
Setting Description Default Value Federated index name Enter a unique name for the federated index.
Federated index names have the following restrictions:- They can contain only letters, numbers, underscores, and hyphens.
- They must begin with a letter or number.
- They cannot be more than 2,048 characters in length.
- They cannot be named kvstore. You can use this string in a longer name, like abc_kvstore.
No default Federated provider Select an Amazon S3 federated provider. No default Remote dataset - Dataset type Select AWS Glue table (Splunk managed). AWS Glue table (customer created) Amazon S3 location Provide the Amazon S3 location path for the AWS CloudTrail dataset that you will search with this federated index. Splunk software will create an AWS Glue table which represents this dataset, and the federated index will map to that AWS Glue table.
- This Amazon S3 location path must be included in the list of Amazon S3 locations that are defined for the federated provider associated with the federated index.
- Provide the Amazon S3 location path up to but not including the 12-digit AWS Account ID folder part of the path.
For more information about filling out this field, see Get the Amazon S3 location path for an Amazon CloudTrail dataset.
No default Time settings The time settings define the time field for the dataset to which the federated index maps. Because AWS CloudTrail datasets have a stable schema, the time settings have default values that you cannot change. The default event Time field is eventtime. Time partitions The time partition settings determine the fields by which the dataset to which the federated index maps is partitioned by time. Because AWS CloudTrail datasets have a stable schema, there is only one level of time partition settings, and this level has default values that you cannot change. The default Time partition field is pk_timestamp. Max search time range Specify the maximum relative time range within which searches of the AWS CloudTrail dataset return results. For example, say you set a Max search time range of 1 year for a federated index named ABC_Index. If you write a federated search that references ABC_Index and give that search a time range of the last 3 years, the search returns results for only the last year.
Federated searches that scan wide time ranges might be expensive. Federated searches with time ranges of 2 years or more might suffer from reduced search performance. Use this setting to reduce your exposure to high rates of data scan unit consumption.
1 year AWS Account IDs Provide the 12-digit AWS account IDs by which the AWS CloudTrail dataset to which this federated index maps is partitioned. You must provide at least 1 AWS account ID.
Alternatively, you can provide a wildcard symbol (*) to partition the dataset by all available AWS account IDs.
If you provide a wildcard for AWS Account IDs, when you invoke this federated index in an
sdselect
search, you must add a WHERE clause that uses apk_account_id
field strictly in an equality condition to identify the AWS account ID partitions involved in the search.
For example,WHERE pk_account_id = 123456789012
is supported, butWHERE pk_account_id != 123456789012
is not supported.For more information about obtaining AWS account IDs by which AWS CloudTrail datasets are partitioned, see Identify partitions to optimize searches of AWS CloudTrail datasets.
No default AWS Regions Provide the AWS region by which the AWS CloudTrail dataset to which this federated index maps is partitioned. You must provide at least 1 AWS region.
Alternatively, you can provide a wildcard symbol (*) to partition the dataset by all available AWS regions.
If you provide a wildcard for AWS Regions, when you invoke this federated index in an
sdselect
search, you must add a WHERE clause that uses apk_region
field to identify the AWS region partitions involved in the search.For more information about obtaining AWS regions by which AWS CloudTrail datasets are partitioned, see Identify partitions to optimize searches of AWS CloudTrail datasets.
No default - Select Save to save the federated index configuration.
- (Optional) Give your users access to the federated index. To run searches over the remote dataset to which the federated index maps, your users must have access permissions for the federated index. See Give your users role-based access control of federated indexes.
Get the Amazon S3 location path for an Amazon CloudTrail dataset
When you set up a federated index that maps to a Splunk-managed AWS Glue table, you must provide an Amazon S3 location path that defines the AWS CloudTrail dataset that you want to search. Splunk software creates an AWS Glue table that represents this dataset, and it is to this AWS Glue table that the federated index you are defining is mapped.
To get the Amazon S3 location, go to the Amazon S3 console and inspect the root bucket for AWS CloudTrail dataset. The bucket contains the full Amazon S3 location path for the dataset.
Splunk software needs only the first few folders of the Amazon S3 location path, up to, but not including the folder with the 12-digit AWS-account-ID
. In other words, when the AWS CloudTrail dataset is associated with 1 AWS account ID, its Amazon S3 location value follows this syntax:
s3://<bucket-name>/<optional-prefix-folders>/AWSLogs/
The <optional-prefix-folders>
might not be present in the location path. This can be one or more additional folders that people optionally set up when they place objects such as AWS CloudTrail log files in Amazon S3 buckets, to differentiate datasets that are being stored in the same Amazon S3 bucket.
An AWS CloudTrail dataset can be associated with multiple AWS account IDs. When there are multiple AWS account IDs associated with an AWS CloudTrail dataset, its AWS S3 location path includes an organization ID that is generated by AWS. Here is the Amazon S3 location syntax for such AWS CloudTrail datasets:
s3://<bucket-name>/<optional-prefix-folders>/o-<organization-ID>/AWSLogs/
For detailed information about using the Amazon S3 console to review and manage Amazon S3 bucket contents, Working with objects in Amazon S3 in the Amazon Simple Storage Service (S3) User Guide.
Identify partitions to optimize searches of AWS CloudTrail datasets
Partitioning is an organization strategy for large datasets that makes it possible for you to search them efficiently. When you partition your data, you organize it into a hierarchical directory structure based on the distinct values of 1 or more fields in the data. Log files in AWS CloudTrail datasets are partitioned by time, meaning they are organized into folders by year, month, and day. This means all of the files associated with a specific date can easily be searched for.
Because AWS CloudTrail datasets have a stable schema, definitions for federated indexes that map to Splunk-managed AWS Glue tables come with default partition time field values that you cannot change.
However, all AWS CloudTrail datasets are also partitioned by two other fields (or "keys"): AWS account ID and AWS region. Splunk software cannot predict the values for these partition keys, so it is up to you to supply them. When you identify the partition keys in the federated index definition, you can run sdselect
searches of the AWS CloudTrail dataset to which the federated index maps are efficient and cost-effective.
Get partition key values for an AWS CloudTrail dataset
All AWS CloudTrail datasets are partitioned by at least 1 AWS account ID and 1 AWS region. This means that when you set up a federated index that maps to a Splunk-managed AWS Glue table, you must provide at least 1 value for the AWS account IDs and AWS regions partition key fields. Splunk software cannot know in advance which AWS account IDs and AWS regions a specific AWS CloudTrail dataset is partitioned by, so these fields do not have default values.
To get values for the AWS account IDs, and AWS regions fields, go to the Amazon S3 console and inspect the full Amazon S3 location path for the dataset. The bold folders in the following AWS Cloudtrail location path syntax example show you where these values can be found:
s3://<bucket-name>/<optional-prefix-folders>/AWSLogs/<AWS-account-ID>/CloudTrail/<AWS-region>/<year>/<month>/<day>/<filename>
For example, in the Amazon S3 console, when you open the AWSLogs folder for an AWS CloudTrail dataset, you'll see the AWS account IDs the dataset is associated with. Similarly, when you open the CloudTrail folder for an AWS CloudTrail dataset, you'll see the AWS regions the dataset is associated with.
Optionally identify all possible partition keys with a wildcard
If an AWS CloudTrail dataset is associated with large number of AWS account IDs or AWS regions and you do not want to take the time to enter every key value into those fields, you can save time by entering wildcard symbols (*) into the fields instead. The wildcard symbol indicates that all possible key values for the field are applied to the federated index definition.
When you use a wildcard symbol for either AWS account IDs or AWS regions in a federated index definition, you must include a WHERE clause that filters results by pk_account_id
or pk_region
when you invoke that federated index in an sdselect
search. See sdselect command WHERE clause operations.
Search your AWS Glue table datasets
After you set up federated indexes that map to AWS Glue table datasets, you can use the sdselect
command to search those datasets. See sdselect command overview.
Delete a federated index
You can delete a federated index that maps to an AWS Glue table that you no longer need to search. You can also delete federated indexes when your data scanning entitlements are depleted, to prevent unintentional usage.
Prerequisites
- A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
- A federated index for Federated Search for Amazon S3 that you want to delete.
Steps
- On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federated search.
- On the Federated index tab, identify a federated index that you want to delete.
- Select Delete for the index you want to delete.
Map a federated index to a customer-created AWS Glue table dataset | Give your users role-based access control of federated indexes |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408
Feedback submitted, thanks!