Splunk Cloud Platform

Federated Search

Map a federated index to an AWS Glue Data Catalog table dataset

After you create an Amazon S3 federated provider for your Splunk Cloud Platform deployment, you create federated indexes for use in federated searches. Each federated index you create maps to a specific AWS Glue Data Catalog table dataset. You invoke federated indexes in your federated searches to tell Splunk software which AWS Glue table dataset you intend to search.

The Splunk platform creates federated indexes on the search head of your Splunk Cloud Platform deployment.

In this task, you do these things:

  • Provide the name of the federated index.
  • Select an Amazon S3 federated provider that supports AWS Glue tables.
  • Select the AWS Glue Data Catalog table dataset type.
  • Identify the AWS Glue table that the federated index maps to.
  • Optionally provide time series settings if your data includes time series information and you want to make use of time functions in your searches.
  • Identify partition time fields, if you have partitioned your dataset into time-based subsets.

You can map a federated index to only one remote dataset at a time. If a federated provider lists several AWS Glue tables over which you want to run federated searches, define a separate federated index for each AWS Glue Data Catalog table dataset.

Prerequisites

Steps

  1. On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federated search.
  2. On the Federated index tab, select Add federated index.

    You might also come to the Add federated index screen directly after creating a federated provider.

  3. Using the following table, specify the settings for your federated index.
    Setting Description Default Value
    Federated index name Enter a unique name for the federated index.


    Federated index names have the following restrictions:

    • They can contain only letters, numbers, underscores, and hyphens.
    • They must begin with a letter or number.
    • They cannot be more than 2,048 characters in length.
    • They cannot be named kvstore. You can use this string in a longer name, like abc_kvstore.
    No default
    Federated provider Select an Amazon S3 federated provider. No default
    Remote dataset - Dataset type Select AWS Glue Data Catalog table. AWS Glue Data Catalog table
    Remote dataset - Dataset name Select the name of the AWS Glue table to which you want the federated index to map.


    If the federated provider associated with the federated index has specific AWS Glue tables listed in Glue Data Catalog tables, a drop-down list with those tables appears here, and you can select a table from it.

    If the federated provider associated with the federated index has a Glue Data Catalog tables value that uses a wildcard to capture multiple table names, you must manually enter the AWS Glue table name into this field. Make sure that the table name is correct and that it is covered by the Glue Data Catalog resource policy you generated when you created the federated provider associated with the federated index.

    No default
    Time settings not required (Optional) Select Time settings not required if the AWS Glue table does not contain time-series data and you do not intend to use time-based fields and functions when you search it. Not selected
    Time field If your federated index definition requires time settings, enter the name of the field that acts as an event timestamp in the selected AWS Glue table.


    The time field can contain only lowercase letters, numbers, underscores, and dot characters ( . ).

    Surround time fields that contain dot characters but which are not nested fields with single quote characters. See "Special handling for sdselect syntax elements" in sdselect command usage.

    No default
    Time format If your federated index definition requires time settings, provide a time format variable or custom time format variable string that matches the Time field.


    You can set the following values for Time format:

    • Set %s when you have UNIX time values with the string data type.
    • Set %UT when you have UNIX time values with the numeric data type.
    • Set %ST when you have values with the SQL timestamp data type.
    • Set a custom string of time format variables when you have values that follow a specific string time format, such as 04-29-2023 11:45:22 PM. For more information and examples of time format strings, see Date and time format variables in the Splunk Cloud Platform Search Reference.

    %UT and %ST are not among the standard set of Splunk platform time format variables. Use them only in the context of Federated Search for Amazon S3.

    You can optionally append the %Q time format variable to time format variables to capture subsecond timestamps, such as milliseconds (%3Q), microseconds (%6Q), and nanoseconds (%9Q). For example, for a time field in numeric-typed UNIX time format with a nanosecond component, use %UT.%9Q, or %UT%9Q if you do not need to separate the subsecond component from the UNIX time value with a dot character ( . ).

    The sdselect command does not support the following time format variables: %c, %+, %Ez, %k, %X, and %x.

    No default
    Unix time field If your federated index definition requires time settings, Unix time field provides an alias for the Time field that Splunk software converts into numeric UNIX time format at search time. Insert the Unix time field into federated searches that require numeric UNIX time field values, or when you want to see your time field in numeric UNIX time format in the search results.


    Unix time field defaults to _time. In Splunk Web, the values of _time always display in human-readable format, unless you are aggregating on the _time field. For example, (avg)_time returns values in numeric UNIX time format.

    If _time already exists as a field name in your Glue table, give the Unix time field a value other than _time.

    For more information and examples that show how usage of the Unix time field in an sdselect search changes depending on the format of the Time field, see Use time fields in sdselect searches.

    _time
    Partition time settings Improve search performance and reduce search cost by identifying partition time fields in the federated index definition. Do this only if the federated index maps to a AWS Glue table dataset that you have partitioned into data subsets by time, such as by year, month, and day.


    For more information about these settings, see Optimize AWS Glue table searches by identifying partition time fields.

    No default
  4. Select Save to save the federated index configuration.
  5. (Optional) Give your users access to the federated index. To run searches over the remote dataset to which the federated index maps, your users must have access permissions for the federated index. See Give your users role-based access control of federated indexes.

Splunk software creates the federated index on the search head of your Splunk Cloud Platform deployment.

In Splunk Web, you can view the federated indexes that you create for your deployment by selecting Settings, then Federated search and then the Federated indexes tab.

Do not designate federated indexes as default indexes for roles or data inputs.

Optimize AWS Glue table searches of Amazon S3 datasets by identifying partition time fields

Partitioning is an organization strategy for large datasets that makes it possible for you to search them efficiently. When you partition your data, you organize it into a hierarchical directory structure based on the distinct values of 1 or more fields in the data.

For example, you might partition your application logs in Amazon S3 by date, breaking them down by year, month, and day. Then you can place files corresponding to a single day's worth of data in an Amazon S3 path like s3://my_bucket/logs/year=2022/month=08/day=23/.

If you generate a AWS Glue table dataset that references partitioned Amazon S3 data, you can map a federated index definition to that dataset and then identify the time fields that determine the hierarchical structure of the data partitions. When you identify the partition time fields in the federated index definition, your searches of that dataset become more efficient and cost effective.

You can set partition time settings for a federated index even if you have selected Time settings not required. Partition time fields exist in Amazon S3 paths. The Time field setting governed by Time settings not required is a time field that exists as a column in your AWS Glue table.

When you define partition time filters for a federated index, you begin by identifying the first level field in the time field hierarchy. Then you identify the second level field, and so on. For example, if your federated index maps to a dataset that you have partitioned by year, month, and day, you identify year as the partition time field for the Level 1 filter, month as the partition time field for the Level 2 filter, and day as the partition time field for the Level 3 filter.

Steps

  1. In your federated index definition, under Partition time settings, select Add new field.
  2. Identify the Level 1 time field by which you have partitioned your data. This is the highest level of partitioning you use. Specify values for the following fields:
    Partition time setting Description
    Partition time field Provide the name of the time field that is the partition key for the indicated partition filter level.


    Values for the Partition time field can contain only lowercase letters, numbers, and underscores.

    Time format Provide a time format string for the indicated Partition time field.


    Compose this time format string out of Splunk-supported time format variables.
    For more information and examples see Date and time format variables in the Search Reference.
    The following time format variables are not supported: %c, %+, %Ez, %k, %X, and %x.

    Type Select the data type of the Partition time field. Your options are String, Integer, and Date.
  3. If you have another partition key in your AWS Glue table, you can create another partition filter level based on it. Select Add new field and identify the filter's Partition time field, Time format, and Type. Repeat this step until you have defined a partition filter level for each partition key in your AWS Glue table.
  4. Select the Time zone that applies to your partition time fields. You must choose a Time zone if you define one or more partition filter levels.
  5. Select Save to save the federated index configuration.

Search your AWS Glue table datasets

After you set up federated indexes that map to AWS Glue table datasets, you can use the sdselect command to search those datasets. See sdselect command overview.

Delete a federated index

You can delete a federated index that maps to an AWS Glue table that you no longer need to search. You can also delete federated indexes when your data scanning entitlements are depleted, to prevent unintentional usage.

Prerequisites

  • A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
  • A federated index for Federated Search for Amazon S3 that you want to delete.

Steps

  1. On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federated search.
  2. On the Federated index tab, identify a federated index that you want to delete.
  3. Select Delete for the index you want to delete.
Last modified on 15 October, 2024
Define an Amazon S3 federated provider   Give your users role-based access control of federated indexes

This documentation applies to the following versions of Splunk Cloud Platform: 9.1.2312, 9.2.2403, 9.2.2406 (latest FedRAMP release)


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters