Run Federated Analytics searches
After you define your Federated Analytics federated provider, you can use your data lake indexes and federated indexes to run searches of your Amazon Security Lake data. The method you choose depends on the age of the data you want to review.
Age of data | Location of data | Type of search run |
---|---|---|
Newer data (up to 31 days old, depending on the retention period set for each data lake index) | In data lake indexes on your Splunk Cloud Platform deployment, freshly ingested from your Amazon Security Lake account. | Standard searches that reference data lake indexes, using standard Splunk processing language (SPL). |
Older data (whatever exceeds the retention period of your data lake indexes) | In your remote Amazon Security Lake datasets | Searches that use the sdselect command to reference federated indexes. Each federated index maps to a specific remote dataset in your Amazon Security Lake account.
|
Search recent Amazon Security Lake data in your data lake indexes
When you set up a Federated Analytics federated provider, you define one or more data lake indexes that ingest data from your Amazon Security Lake account on an ongoing basis. All ingested Amazon Security Lake data follows the Open Cybersecurity Schema Framework (OCSF) format, and each index you define has filters that ensure that it ingests only data conforming to a specific OCSF category, such as Application Activity, Findings, or Identity & Access.
You can run a search over one or more data lake indexes by referencing the indexes in your search and then using the same SPL you would use for any other Splunk search. All data lake index names begin with dl_
by default. If you keep this naming convention you can search all of your data lake indexes at once by putting dl_*
in your search string.
Because data lake indexes contain fresh, local data, they are ideal for scheduled searches and alerts that run on a frequent basis.
Here is an example of a simple threat-hunting search of a single data lake index.
index=dl_network_activity_index
AND _time >= 1716415000
AND _time <= 1716415000 + <time_window_in_seconds>
AND traffic.bytes > 66666
AND src_endpoint.port = 6666
AND connection_info.protocol_num = 6
AND traffic.packets = 66
| table traffic.bytes, src_endpoint.ip, dst_endpoint.ip
Each data lake index has a data retention period that you define as part of the federated provider setup. Data lake index retention periods can span up to a maximum of 31 days from the time you run your search. If you want to search Amazon Security Lake data with timestamps that exceed the retention periods of your data lake indexes, you will need to run a federated search of the remote datasets in your Amazon Security Lake account.
See the Search Manual and the Search Reference if you need help with SPL.
Run federated searches over remote datasets in your Amazon Security Lake account
Run federated searches over your remote Amazon Security Lake datasets where they live in Amazon S3 when you need to access ASL data that you're not ingesting into your data lake indexes, or when you need to review ASL data with timestamps that exceed your data lake index retention periods.
Federated searches over remote Amazon Security Lake datasets are best suited for ad hoc threat hunting searches that you run on an infrequent basis, due to the performance and cost limitations of such searches.
When you write a federated search you must do the following things:
- Begin your search with the
sdselect
command. In federated searches, thesdselect
command does the most work. - Invoke a federated index that you defined for your federated provider. Each federated index maps to a specific remote dataset in your Amazon Security Lake account. The
sdselect
command does not support searching multiple federated indexes with a single search.
Here is an example of a simple threat-hunting federated search of the dataset represented by a single federated index.
| sdselect strict=t traffic.bytes, src_endpoint.ip, dst_endpoint.ip
FROM dlf_buttercup_fa_all_asl_data_tables_index
WHERE time >= 1716415000000
AND time <= 1716415000000 + <time_window_in_seconds> * 1000
AND (eventDay="20240522" OR eventDay="20240523")
AND category_name="Network Activity"
AND traffic.bytes > 66666
AND src_endpoint.port = 6666
AND connection_info.protocol_num = 6
AND traffic.packets = 66
For more information about using sdselect
to run federated searches, see sdselect command overview.
Give your users role-based access control of data lake indexes and federated indexes | sdselect command overview |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408
Feedback submitted, thanks!