Map a federated index to an AWS Glue Data Catalog table dataset

After you create an Amazon S3 federated provider for your Splunk Cloud Platform deployment, you create federated indexes for use in federated searches. Each federated index you create maps to a specific AWS Glue Data Catalog table dataset. You invoke federated indexes in your federated searches to tell Splunk software which AWS Glue table dataset you intend to search.

The Splunk platform creates federated indexes on the search head of your Splunk Cloud Platform deployment.

In this task, you do these things:

Provide the name of the federated index.
Select an Amazon S3 federated provider that supports AWS Glue tables.
Select the AWS Glue Data Catalog table dataset type.
Identify the AWS Glue table that the federated index maps to.
Optionally provide time series settings if your data includes time series information and you want to make use of time functions in your searches.
Identify partition time fields, if you have partitioned your dataset into time-based subsets.

You can map a federated index to only one remote dataset at a time. If a federated provider lists several AWS Glue tables over which you want to run federated searches, define a separate federated index for each AWS Glue Data Catalog table dataset.

Prerequisites

A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
You must have an AWS Glue table that refers to data you store in Amazon S3. See Map a federated index to an AWS Glue table dataset.
You must define an Amazon S3 federated provider that supports AWS Glue tables. See Define an Amazon S3 federated provider.

Steps

On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federated search.
On the Federated index tab, select Add federated index.

You might also come to the Add federated index screen directly after creating a federated provider.

Using the following table, specify the settings for your federated index.

Setting	Description	Default Value
Federated index name	Enter a unique name for the federated index. Federated index names have the following restrictions: They can contain only letters, numbers, underscores, and hyphens. They must begin with a letter or number. They cannot be more than 2,048 characters in length. They cannot be named kvstore. You can use this string in a longer name, like abc_kvstore.	No default
Federated provider	Select an Amazon S3 federated provider.	No default
Remote dataset - Dataset type	Select AWS Glue Data Catalog table.	AWS Glue Data Catalog table
Remote dataset - Dataset name	Select the name of the AWS Glue table to which you want the federated index to map. If the federated provider associated with the federated index has specific AWS Glue tables listed in Glue Data Catalog tables, a drop-down list with those tables appears here, and you can select a table from it. If the federated provider associated with the federated index has a Glue Data Catalog tables value that uses a wildcard to capture multiple table names, you must manually enter the AWS Glue table name into this field. Make sure that the table name is correct and that it is covered by the Glue Data Catalog resource policy you generated when you created the federated provider associated with the federated index.	No default
Time settings not required	(Optional) Select Time settings not required if the AWS Glue table does not contain time-series data and you do not intend to use time-based fields and functions when you search it.	Not selected
Time field	If your federated index definition requires time settings, enter the name of the field that acts as an event timestamp in the selected AWS Glue table. The time field can contain only lowercase letters, numbers, underscores, and dot characters ( . ). Surround time fields that contain dot characters but which are not nested fields with single quote characters. See "Special handling for sdselect syntax elements" in sdselect command usage.	No default
Time format	If your federated index definition requires time settings, provide a time format variable or custom time format variable string that matches the Time field. You can set the following values for Time format: Set %s when you have UNIX time values with the `string` data type. Set %UT when you have UNIX time values with the `numeric` data type. Set %ST when you have values with the SQL `timestamp` data type. Set a custom string of time format variables when you have values that follow a specific `string` time format, such as 04-29-2023 11:45:22 PM. For more information and examples of time format strings, see Date and time format variables in the Splunk Cloud Platform Search Reference. %UT and %ST are not among the standard set of Splunk platform time format variables. Use them only in the context of Federated Search for Amazon S3. You can optionally append the %Q time format variable to time format variables to capture subsecond timestamps, such as milliseconds (%3Q), microseconds (%6Q), and nanoseconds (%9Q). For example, for a time field in numeric-typed UNIX time format with a nanosecond component, use %UT.%9Q, or %UT%9Q if you do not need to separate the subsecond component from the UNIX time value with a dot character ( . ). The `sdselect` command does not support the following time format variables: %c, %+, %Ez, %k, %X, and %x.	No default
Unix time field	If your federated index definition requires time settings, Unix time field provides an alias for the Time field that Splunk software converts into numeric UNIX time format at search time. Insert the Unix time field into federated searches that require numeric UNIX time field values, or when you want to see your time field in numeric UNIX time format in the search results. Unix time field defaults to `_time`. In Splunk Web, the values of `_time` always display in human-readable format, unless you are aggregating on the `_time` field. For example, `(avg)_time` returns values in numeric UNIX time format. If `_time` already exists as a field name in your Glue table, give the Unix time field a value other than `_time`. For more information and examples that show how usage of the Unix time field in an `sdselect` search changes depending on the format of the Time field, see Use time fields in sdselect searches.	_time
Partition time settings	Improve search performance and reduce search cost by identifying partition time fields in the federated index definition. Do this only if the federated index maps to a AWS Glue table dataset that you have partitioned into data subsets by time, such as by year, month, and day. For more information about these settings, see Optimize AWS Glue table searches by identifying partition time fields.	No default

Select Save to save the federated index configuration.
(Optional) Give your users access to the federated index. To run searches over the remote dataset to which the federated index maps, your users must have access permissions for the federated index. See Give your users role-based access control of federated indexes.

Splunk software creates the federated index on the search head of your Splunk Cloud Platform deployment.

In Splunk Web, you can view the federated indexes that you create for your deployment by selecting Settings, then Federated search and then the Federated indexes tab.

Do not designate federated indexes as default indexes for roles or data inputs.

Optimize AWS Glue table searches of Amazon S3 datasets by identifying partition time fields

Partitioning is an organization strategy for large datasets that makes it possible for you to search them efficiently. When you partition your data, you organize it into a hierarchical directory structure based on the distinct values of 1 or more fields in the data.

For example, you might partition your application logs in Amazon S3 by date, breaking them down by year, month, and day. Then you can place files corresponding to a single day's worth of data in an Amazon S3 path like s3://my_bucket/logs/year=2022/month=08/day=23/.

If you generate a AWS Glue table dataset that references partitioned Amazon S3 data, you can map a federated index definition to that dataset and then identify the time fields that determine the hierarchical structure of the data partitions. When you identify the partition time fields in the federated index definition, your searches of that dataset become more efficient and cost effective.

You can set partition time settings for a federated index even if you have selected Time settings not required. Partition time fields exist in Amazon S3 paths. The Time field setting governed by Time settings not required is a time field that exists as a column in your AWS Glue table.

When you define partition time filters for a federated index, you begin by identifying the first level field in the time field hierarchy. Then you identify the second level field, and so on. For example, if your federated index maps to a dataset that you have partitioned by year, month, and day, you identify year as the partition time field for the Level 1 filter, month as the partition time field for the Level 2 filter, and day as the partition time field for the Level 3 filter.

Steps

In your federated index definition, under Partition time settings, select Add new field.

Identify the Level 1 time field by which you have partitioned your data. This is the highest level of partitioning you use. Specify values for the following fields:

Partition time setting	Description
Partition time field	Provide the name of the time field that is the partition key for the indicated partition filter level. Values for the Partition time field can contain only lowercase letters, numbers, and underscores.
Time format	Provide a time format string for the indicated Partition time field. Compose this time format string out of Splunk-supported time format variables. For more information and examples see Date and time format variables in the Search Reference. The following time format variables are not supported: %c, %+, %Ez, %k, %X, and %x.
Type	Select the data type of the Partition time field. Your options are String, Integer, and Date.

If you have another partition key in your AWS Glue table, you can create another partition filter level based on it. Select Add new field and identify the filter's Partition time field, Time format, and Type. Repeat this step until you have defined a partition filter level for each partition key in your AWS Glue table.
Select the Time zone that applies to your partition time fields. You must choose a Time zone if you define one or more partition filter levels.
Select Save to save the federated index configuration.

Search your AWS Glue table datasets

After you set up federated indexes that map to AWS Glue table datasets, you can use the sdselect command to search those datasets. See sdselect command overview.

Delete a federated index

You can delete a federated index that maps to an AWS Glue table that you no longer need to search. You can also delete federated indexes when your data scanning entitlements are depleted, to prevent unintentional usage.

Prerequisites

A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
A federated index for Federated Search for Amazon S3 that you want to delete.

Steps

On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federated search.
On the Federated index tab, identify a federated index that you want to delete.
Select Delete for the index you want to delete.

Map a federated index to an AWS Glue Data Catalog table dataset

Prerequisites

Steps

Optimize AWS Glue table searches of Amazon S3 datasets by identifying partition time fields

Steps

Search your AWS Glue table datasets

Delete a federated index

Prerequisites

Steps

Comments

Map a federated index to an AWS Glue Data Catalog table dataset

Was this topic useful?