About Federated Analytics
The amount of data collected in low-cost cloud and purpose-built remote data stores is growing exponentially. But people who maintain large datasets in these data lakes still need to know what is in that data. Federated Analytics can solve this problem, starting with data stored in Amazon Security Lake.
Amazon Security Lake is a fully managed data lake service from Amazon. Amazon Security Lake aggregates high-volume security data from a variety of sources into a purpose-built data lake that is stored in your AWS account. Amazon Security Lake normalizes all of the data it stores by schematizing it into the Open Cybersecurity Schema Framework (OCSF) format.
If you keep data in Amazon Security Lake, Federated Analytics gives you two methods for applying threat detection and threat hunting searches to that data.
- For threat detection, you can ingest recent Amazon Security Lake data into local indexes on your Splunk Cloud Platform deployment, and then apply high-frequency scheduled searches and alerts to that data.
- For threat hunting, you can run infrequent ad hoc federated searches over long time-range Amazon Security Lake datasets where they live in Amazon S3.
Ingest filtered Amazon Security Lake data into your Splunk Cloud Platform deployment
Federated Analytics creates data lake indexes on your Splunk Cloud Platform deployment, and ingests your recent Amazon Security Lake data into those indexes on an ongoing basis.
- Data ingestion filters that you define ensure that each data lake index contains data corresponding only to a specific OCSF category.
- In addition, you set data retention periods for each of your data lake indexes that ensure that they keep ingested Amazon Security Lake data only for a specific window of time, such as 7 days or 1 month (31 days).
For example, you might have a set of 3 data lake indexes that contain newly-archived data from your Amazon Security Lake account. You have set up ingest filters for these data lake indexes so they contain recent data for the System Activity, Findings, and Network Activity OCSF categories, respectively. All of your data lake indexes have their data retention periods set so they retain ingested data only for 31 days.
After you set up your Federated Analytics data lake indexes, you can run ordinary Splunk searches over them. Searches over Federated Analytics data lake indexes are best suited for threat detection: iterative scheduled searches and alerts that you run on a frequent basis, because they take advantage of the low latency of searches you run locally on Splunk platform indexes.
If you use Enterprise Security, you can also apply high-frequency alerts and threat detections to the datasets within those indexes, to ensure you are on top of your most recent Amazon Security Lake data. For more information about threat detections in Enterprise Security, see Use detections to search for threats in Splunk Enterprise Security in Administer Splunk Enterprise Security.
Run ad hoc federated searches over your remote Amazon Security Lake datasets
But what about the long-standing Amazon Security Lake datasets that you don't keep in your data lake indexes, as well as the datasets that "age out" of your data lake indexes? For these datasets, which can extend months and years into the past, Federated Analytics provides the option of running federated searches. You can run federated searches from your Splunk Cloud Platform deployment that scan and analyze your Amazon Security Lake datasets in the Amazon S3 buckets where they live.
- To set up federated search for your Amazon Security Lake datasets, you'll define federated indexes that map to the Amazon Security Lake datasets that you want to search.
- Later, when you want to search a specific remote Amazon Security Lake dataset, you'll include a reference to the federated index that maps to that dataset in your federated search string.
Federated searches over remote Amazon Security Lake datasets are best suited for ad hoc threat hunting searches that you run on an infrequent basis, due to the performance and cost model of such searches.
Federated Analytics can run federated searches only over Amazon Security Lake datasets. If you want to run federated searches over other kinds of remote data that you store in Amazon S3, you must use Federated Search for Amazon S3.
What you need to get started with Federated Analytics
To get started with using Federated Analytics to ingest and run federated searches of your Amazon Security Lake data, you must have the following things:
- A Splunk role with the admin_all_objects capability.
- A Splunk Cloud Platform deployment on Victoria Experience that has Federated Analytics activated.
- An AWS account with Amazon Security Lake activated. Your Amazon Security Lake must be in the same AWS Region as your Splunk Cloud Platform deployment.
Search processing language (SPL) requirements for federated searches
When you use federated indexes to run federated searches over remote Amazon Security Lake datasets, you must use the sdselect
command.
The sdselect
command has features similar to those of the tstats
and sort
commands. The sdselect
command supports filtering, statistical analysis, and group-by clauses.
See sdselect command overview.
SPL requirements for data lake index searches
All existing SPL can be applied to searches of the Amazon Security Lake data that you ingest into your local data lake indexes.
Supported encryption standards
Federated Analytics supports the following encryption standards for data in Amazon Security Lake:
- Server-side encryption with Amazon S3-Managed Keys (SSE-S3)
- Server-side encryption with the AWS Key Management Service (SSE-KMS)
Federated Search for Amazon S3 supports SSE-S3 without any additional setup requirements.
For more information about KMS encryption setup in relation to Amazon Security Lake, see Data protection in Amazon Security Lake in the Amazon Security Lake User Guide.
For more information about KMS encryption pricing, see AWS Key Management Service Pricing on the AWS website.
Restrictions
Federated Analytics is available only to Splunk Cloud Platform users with deployments in AWS regions. Your Amazon Security Lake must be in the same AWS region as your Splunk Cloud Platform deployment.
Federated Analytics does not support the following kinds of Splunk Cloud Platform deployments:
- Deployments in Google Cloud regions.
- FedRAMP High, FedRAMP Moderate, and DoD IL5 deployments.
Federated Analytics can search only Amazon Security Lake datasets that have S3 Standard storage classes. Federated Analytics cannot search Amazon Security Lake datasets that have alternative storage classes, such as S3 Intelligent-Tiering and S3 Glacier.
Federated Analytics cannot run federated searches over remote datasets in Amazon S3 buckets that are not part of your Amazon Security Lake data lake. Use Federated Search for Amazon S3 to run federated searches over non-ASL data. See About Federated Search for Amazon S3.
Checklist of tasks to set up Federated Analytics
When you set up Federated Analytics for your Splunk Cloud Platform deployment, you will define a federated provider that facilitates permissions for and connections to your Amazon Security Lake account, so that you can ingest and search the data in that account.
You will set up permissions for your federated provider so that it can:
- Ingest datasets from your Amazon Security Lake account to data lake indexes on your Splunk Cloud Platform deployment.
- Run federated searches over remote datasets located in your Amazon Security Lake account.
You will wrap up the federated provider definition process by defining two kinds of indexes:
- You'll define data lake indexes on your Splunk Cloud Platform deployment that ingest data from your Amazon Security Lake account.
- You'll define federated indexes that allow you to run federated searches of the remote data stored in your Amazon Security Lake account.
After you define your Amazon Security Lake federated provider you will be ready to run Federated Analytics searches of your ingested and remote Amazon Security lake data.
Use this checklist to guide you through the cross-account setup of Federated Analytics.
Step | Task | Description | Service |
---|---|---|---|
1 | Turn on token authentication | You must turn on token authentication to allow for the automatic setup of cross-account permissions between your Splunk Cloud Platform deployment and your Amazon Security Lake account. You must turn token authentication on if it is turned off. | Splunk Cloud Platform |
2 | Begin defining an Amazon Security Lake federated provider | Begin defining your Amazon Security Lake federated provider by naming it. | Splunk Cloud Platform |
3 | Create the Amazon Security Lake subscriber for data ingestion | Create an Amazon Security Lake subscriber for data ingestion access so Federated Analytics can ingest your Amazon Security Lake datasets into data lake indexes on your Splunk Cloud Platform deployment. |
|
4 | Create the Amazon Security Lake subscriber for federated search access | Create an Amazon Security Lake subscriber for federated search access so you can run federated searches over your remote Amazon Security Lake datasets. |
|
5 | Obtain AWS Glue data catalog database and tables | Get AWS Glue database and AWS Glue table values from the AWS Resource Access Manager console. Add them to your Amazon Security Lake federated provider definition. |
|
6 | Set up data ingest and retention rules for data lake indexes | Create data lake indexes on your Splunk Cloud Platform deployment that use data ingest filters and data retention rules that you define. Your users can then schedule fast threat detection searches over the fresh, curated Amazon Security Lake datasets on those data lake indexes. | Splunk Cloud Platform |
7 | Map federated indexes to AWS Glue tables | Complete the definition of your Amazon Security Lake federated provider by creating federated indexes and mapping them to specific AWS Glue tables that represent specific remote Amazon Security Lake datasets. Your users can then run ad hoc threat hunting searches over those datasets. | Splunk Cloud Platform |
8 | Give your users role-based access control of data lake indexes and federated indexes | Determine what Federated Analytics data your users can search. Set up role-based access to data lake indexes and federated indexes. | Splunk Cloud Platform |
9 | Run Federated Analytics searches | Learn how to search your remote Amazon Security Lake datasets with the sdselect command.
|
Splunk Cloud Platform |
10 | Federated Analytics and Splunk Enterprise Security | Find out how to set up a smooth interface between Splunk Enterprise Security and the Amazon Security Lake datasets that you maintain with Federated Analytics. |
|
Compliance and certifications for Federated Analytics
Splunk Cloud Platform has attained a number of compliance attestations and certifications from industry-leading auditors as part of Splunk's commitment to adhere to industry standards worldwide and Splunk's efforts to safeguard customer data. Generally Available products and features that are currently in scope of Splunk's compliance program may not be a part of the third-party audit report until the next assessment cycle. Federated Analytics is in scope of the following compliance programs and will be audited at the next assessment cycle.
- SOC 2 Type II: The SOC 2 audit assesses an organization's security, availability, process integrity, and confidentiality processes to provide assurance about the systems that a company uses to protect customers' data. If you require the SOC 2 Type II attestation to review, contact your Splunk sales representative.
- Health Insurance Portability and Accountability Act (HIPAA): HIPAA is a U.S. federal law that sets forth national standards governing the processing of protected health information (PHI). HIPAA is intended to improve the effectiveness and efficiency of healthcare systems by establishing standards for the use of electronic records in healthcare; establishing standards for accessing, storing and transmitting PHI; and by protecting the privacy and security of PHI. Splunk's HIPAA compliance offering is annually audited by a third-party for compliance with HIPAA requirements, resulting in annual third party attestation reports.
- The Payment Card Industry Data Security Standard (PCI DSS): PCI DSS is a global information security standard created to better control cardholder data and reduce credit card fraud. PCI DSS applies to all entities that store, process, or transmit cardholder data and/or sensitive authentication data. Authorized users can access related documentation in the Customer Trust Portal.
- FedRAMP Authorization at the Moderate Impact Level: This authorization allows for the use of Federated Analytics within Splunk Cloud Platform by U.S. Federal Government agencies requiring cloud-based services authorized at the moderate security impact level. Additional information about FedRAMP is available to Splunk customers under non-disclosure agreement from the Customer Trust Portal.
For additional information about compliance and certifications, see Compliance at Splunk.
Give your users role-based access control of federated indexes | Begin defining an Amazon Security Lake federated provider |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408
Feedback submitted, thanks!