Splunk Cloud Platform

Federated Search

Acrobat logo Download manual as PDF


Acrobat logo Download topic as PDF

About Federated Search for Amazon S3

Use Federated Search for Amazon S3 to search data from your Amazon S3 buckets from your Splunk Cloud Platform deployment without the need to ingest or index it first.

With Federated Search for Amazon S3, you run searches that apply filtering and statistical functions to AWS Glue Data Catalog tables which represent the data in your Amazon S3 buckets.

Overview of Amazon S3 federated search

Federated Search for Amazon S3 is ideal for searches over Amazon S3 datasets that conform to specific schema structures or which utilize partition key filters.

Federated Search for Amazon S3 searches apply filtering and statistical functions to AWS Glue tables that you define. AWS Glue tables contain column and schema definitions for data in your Amazon S3 buckets.

With Federated Search for Amazon S3 you can:

  • Investigate historical security data in Amazon S3 buckets, potentially over 2 or more years of logs.
  • Perform infrequent statistical analysis over historical data in Amazon S3 buckets, also potentially over 2 or more years of logs.
  • Enrich data indexed into your Splunk Cloud Platform deployment through lookups of Amazon S3 data.
  • Explore Amazon S3 datasets to locate business-critical data for ingestion into your Splunk Cloud Platform deployment.
  • Manage infrequently-accessed Amazon S3 datasets that you store for regulation, compliance, or investigatory reasons.

Restrictions

Federated Search for Amazon S3 is available only to Splunk Cloud Platform users with deployments in AWS regions.

Federated Search for Amazon S3 does not support the following kinds of Splunk Cloud Platform deployments:

  • Deployments in Google Cloud regions.
  • HIPAA, IRAP, PCI DSS, FedRAMP High, and DoD IL5 deployments.

Federated Search for Amazon S3 does support FedRAMP Moderate deployments.

Each AWS Glue table that you search must be based on Amazon S3 file objects with the same file type and compression type.

Federated Search for Amazon S3 can search objects only in Amazon S3 buckets that have S3 Standard storage classes. Federated Search for Amazon S3 cannot search objects in Amazon S3 buckets that have alternative storage classes, such as S3 Intelligent-Tiering and S3 Glacier.

Federated Search for Amazon S3 cannot search Amazon S3 buckets that you have configured as Requester pays buckets. See Amazon S3 bucket configuration.

Supported file types and data formats

Federated Search for Amazon S3 supports the following file types and data formats.

  • CSV or CSV-type formats
  • new-line JSON
  • Parquet (version 2.5.0 or higher)
  • ORC
  • Avro
  • XML

For more information about the file types supported by AWS Glue tables, search "Working with tables on the AWS Glue console" in the AWS Glue Developer Guide.

Federated Search for Amazon S3 supports data originating from Splunk Cloud Platform features such as the Edge Processor solution and ingest actions. See Use Edge Processors or Use ingest actions to improve the data input process in Getting Data In.

Federated Search for Amazon S3 does not support data in Dynamic Data Self-Storage (DDSS) format.

For more information about accepted file types and data formats, see Identify the Amazon S3 data that you want to search.

Supported compression types

Federated Search for Amazon S3 supports the following compression types:

  • ZIP, only for archives containing a single object
  • GZIP
  • BZIP2
  • LZ4
  • Snappy, both standard and Hadoop formats

Federated searches of compressed files might take longer to complete than federated searches of uncompressed files.

Supported encryption standards

Federated Search for Amazon S3 supports the following encryption standards.

  • Server-side encryption with Amazon S3-Managed Keys (SSE-S3)
  • Server-side encryption with the AWS Key Management Service (SSE-KMS)

Federated Search for Amazon S3 supports SSE-S3 without any additional setup requirements. For more information, search on "Amazon S3-managed encryption keys" in the Amazon Simple Storage Service User Guide.

Federated Search for Amazon S3 supports only customer-managed SSE-KMS keys. SSE-KMS support requires some setup when you define your federated provider. See Define an Amazon S3 federated provider.

Partitioning

Federated Search for Amazon S3 supports partitioned and unpartitioned datasets. Examples of supported partitioning styles include Apache Hive and non-Apache Hive. Apache Hive partitions are made up of key-value pairs, while non-Apache Hive partitions are only the values.

Partitioning style Format Example
Apache Hive ./<partition_unit>=<value>/<partition_unit>=<value>/ ./year=2022/month=06/
non-Apache Hive ./<value>/<value>/ ./2022/06/

For more information about creating partitioned AWS Glue tables, search on "AWS Glue tables" in the AWS Glue Developer Guide.

Search processing language (SPL) requirements

Use the sdselect command to search AWS Glue tables.

The sdselect command has features similar to those of the tstats and sort commands. The sdselect command supports filtering, statistical analysis, and group-by clauses.

See sdselect command overview.

What you need to get started

To get started with federated search of Amazon S3 data, you must have the following things:

  • A Splunk Cloud Platform deployment on AWS that has Federated Search for Amazon S3 activated.
  • An AWS account with data in Amazon S3 buckets that conforms to supported file and compression types.
  • One or more AWS Glue tables that reference the data in those Amazon S3 buckets.

If you are new to AWS Glue and do not have AWS Glue tables, don't worry. You can find a list of different ways to create AWS Glue tables based on the data in your Amazon S3 buckets in Create an AWS Glue Data Catalog table.

Activate federated search

To activate Federated Search for Amazon S3 for your Splunk Cloud Platform deployment, contact your Splunk Sales representative. As part of this activation, you acquire a data scan entitlement that is based on the amount of Amazon S3 data, in terabytes, that you are projected to search over the upcoming year. Data scan entitlements are made up of Data Scan Units (DSUs). Each DSU is equivalent to 10 TB of data scanning capabilities.

For more information about DSUs, see Splunk Offerings Purchase Capacity and Limitations.

Monitor your data scan entitlement

You can see what your total data scan entitlement is for your current license term and track how much of that data scan entitlement your Federated Search for Amazon S3 searches have used to date with the Federated Search for Amazon S3 dashboard in the Cloud Monitoring Console. See Use the License Usage dashboards in the Splunk Cloud Platform Admin Manual.

Checklist of tasks to set up Federated Search for Amazon S3

Use this checklist to guide you through the cross-account setup of Federated Search for Amazon S3.

Step Task Description Service
1 Turn on token authentication You must turn on token authentication to allow for the automatic setup of cross-account permissions between your Splunk Cloud Platform deployment and the Amazon S3 account. You must turn token authentication on if it is turned off. Splunk Cloud Platform
2 Identify the Amazon S3 data that you want to search Find the data that you want to search in your Amazon S3 account. If it is not there, create buckets and put your data in them. Amazon S3
3 Create a Glue Data Catalog Table If you do not already have an AWS Glue table that contains column definitions of the data you want to search in your Amazon S3 bucket, there are a variety of methods you can use to create one.
  • Amazon S3
  • AWS Glue
4 Define an Amazon S3 federated provider Help your Splunk Cloud Platform deployment access your AWS data by creating a federated provider definition. This task breaks down into the following subtasks:
  • Create a federated provider definition in Splunk Web.
  • Generate policies based on the federated provider definition and paste them into the following locations:
    • Your Amazon S3 accounts.
    • Your AWS Glue account.
    • The AWS Key Management Service, if you encrypt data in your Amazon S3 buckets with SSE-KMS encryption.
  • Splunk Cloud Platform
  • Amazon S3
  • AWS Glue
5 Map a federated index to an AWS Glue Data Catalog table dataset Create federated indexes and map them to specific AWS Glue table datasets. Optionally identify time fields and define partition filtering rules for your federated indexes. Splunk Cloud Platform
6 Give your users role-based access control of federated indexes Determine which AWS Glue table datasets your users can search. Set up role-based access to federated indexes for your users so they can reference the federated indexes in their federated searches. Splunk Cloud Platform
7 Search your AWS Glue Data Catalog table datasets Learn how to search your AWS Glue table datasets with the sdselect command. Splunk Cloud Platform

Amazon S3 bucket configuration

You cannot use Federated Search for Amazon S3 to search Amazon S3 buckets that are configured to be Requester Pays buckets. If Requester Pays is turned on for your Amazon S3 bucket and you try to run a federated search over that bucket, Splunk Cloud Platform rejects your search.

Searches of Amazon S3 buckets that are configured to be Requester Pays buckets incur data transfer charges in accordance with the Amazon S3 pricing schedule located at Amazon S3 Simple Storage Service Pricing.

Splunk is not liable for any such data transfer charges incurred.

Compliance and certifications for Federated Search for Amazon S3

Splunk Cloud Platform has attained a number of compliance attestations and certifications from industry-leading auditors as part of Splunk's commitment to adhere to industry standards worldwide and Splunk's efforts to safeguard customer data. Generally Available products and features that are currently in scope of Splunk's compliance program may not be a part of the third-party audit report until the next assessment cycle. Federated Search for Amazon S3 is in scope of the following compliance programs and will be audited at the next assessment cycle.

  • SOC 2 Type II: The SOC 2 audit assesses an organization's security, availability, process integrity, and confidentiality processes to provide assurance about the systems that a company uses to protect customers' data. If you require the SOC 2 Type II attestation to review, contact your Splunk sales representative.
  • Health Insurance Portability and Accountability Act (HIPAA): HIPAA is a U.S. federal law that sets forth national standards governing the processing of protected health information (PHI). HIPAA is intended to improve the effectiveness and efficiency of healthcare systems by establishing standards for the use of electronic records in healthcare; establishing standards for accessing, storing and transmitting PHI; and by protecting the privacy and security of PHI. Splunk's HIPAA compliance offering is annually audited by a third-party for compliance with HIPAA requirements, resulting in annual third party attestation reports.
  • The Payment Card Industry Data Security Standard (PCI DSS): PCI DSS is a global information security standard created to better control cardholder data and reduce credit card fraud. PCI DSS applies to all entities that store, process, or transmit cardholder data and/or sensitive authentication data. Authorized users can access related documentation in the Customer Trust Portal.
  • FedRAMP Authorization at the Moderate Impact Level: This authorization allows for the use of Federated Search for Amazon S3 within Splunk Cloud Platform by U.S. Federal Government agencies requiring cloud-based services authorized at the moderate security impact level. Additional information about FedRAMP is available to Splunk customers under non-disclosure agreement from the Customer Trust Portal.

For additional information about compliance and certifications, see Compliance at Splunk.

Last modified on 10 January, 2024
PREVIOUS
Turn off transparent mode
  NEXT
Identify the Amazon S3 data that you want to search

This documentation applies to the following versions of Splunk Cloud Platform: 9.0.2305 (latest FedRAMP release), 9.1.2308, 9.1.2312


Was this documentation topic helpful?


You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters