Splunk Cloud Platform

Federated Search

This documentation does not apply to the most recent version of Splunk Cloud Platform. For documentation on the most recent version, go to the latest release.

sdselect command usage

The sdselect command is a report-generating command. See Command types in the Splunk Cloud Platform Search Reference.

Generating commands use a leading pipe character and must be the first command in a search.

When you use sdselect to search an AWS Glue Data Catalog table, you must use a FROM clause to reference a federated index that maps to that AWS Glue table. This mapping lets sdselect run an efficient statistical analysis of fields in that AWS Glue table. The AWS Glue table, in turn, represents a dataset in your Amazon S3 storage.

The sdselect command supports path navigation syntax for nested fields representing specific datasets within a JSON array or Amazon S3 directory structure. See Use nested fields to identify datasets in a hierarchical structure.

sdselect searches consume your data scan entitlement

When you run sdselect searches, Federated Search for Amazon S3 counts the volume of data on disk that your search scans against your total data scan entitlement. To reduce data scan entitlement consumption, construct your searches so that they do not scan more data than necessary.

Splunk software tracks the volume on disk of the data you are scanning, not the number of events that you are searching. A search of compressed data, such as data from GZIP or Parquet files, might consume less of your data scan entitlement than a similar search of uncompressed data.

For more information about obtaining and monitoring your Federated Search for Amazon S3 data scan entitlement, see About Federated Search for Amazon S3.

Federated Search for Amazon S3 counts searches that you cancel against your data scan entitlement for the amount of data that has been scanned at the point of search cancellation. Failed searches do not incur charges against your data scan entitlement.

Data size limits for sdselect searches

The amount of AWS Glue table dataset data that the sdselect command can process in a single search has the following limitations.

  • The sum of a single row or column in a AWS Glue table dataset cannot exceed 32 MB. For example, you exceed this limit if a single column in a table row has a column that is 100 MB in size.
  • sdselect searches have a configurable data-scanned-per-search control limit of 10 TB. If you run an sdselect search that exceeds this limit, the search fails without incurring any consumption of your data scan entitlement. If you need to change the data-scanned-per-search limit for your sdselect searches, contact Splunk Support.

Use nested fields to identify datasets in a hierarchical structure

The sdselect command supports path navigation syntax for nested fields. Such fields begin with one or more struct type elements and end with a named field. Nested fields use dot characters ( . ) to separate the structure levels. You can use nested fields to indicate specific datasets within a JSON array or Amazon S3 directory structure. You can use these nested "dataset" fields in place of ordinary fields throughout sdselect searches.

For example, the nested field userIdentity.type contains possible type field values for the Amazon IAM userIdentity element. You might use it in the WHERE clause of an sdselect search like this:

| sdselect count(userIdentity.type) FROM federated:myindex WHERE userIdentity.type = "AssumedRole"

You might also have other nested fields for the userIdentity element, such as userIdentity.userName and userIdentity.accountId.

Nested fields cannot include special characters. For example, if you have a field named a.b.c where b.c is a flattened field, your sdselect search will fail with an error message.

If you use nested fields in an sdselect search, you cannot surround those nested fields by quotation marks. See Special handling for sdselect syntax elements.

Rename nested fields that have the same named field

When nested fields have the same named field but different preceding struct elements, you might run into issues when you use them together in an sdselect search. The sdselect command treats multiple nested fields with the same named field as if they are multiple fields with the same name. When sdselect finds multiple fields with the same name in a search string, it discards all of the fields with the same named field except for the last listed field.

For example, say you run the following search:

| sdselect netperf.asnum, network.asnum FROM federated:parquet LIMIT 100

The sdselect command handles the nested fields netperf.asnum and network.asnum as if they both are named asnum. The sdselect command returns results only for network.asnum, in a field titled asnum.

You can get around the problem of having multiple nested fields with the same named field by renaming such nested fields at search time. For example, the following search renames the nested fields so that 2 asnum fields appear in the results:

| sdselect netperf.asnum AS netperf_asnum, network.asnum AS network_asnum FROM federated:parquet LIMIT 100

Special handling for sdselect syntax elements

The sdselect command expects you to handle certain syntax elements in specific ways. For example, you must never surround nested fields with single or double quote characters. On the other hand, you must surround flattened fields which contain special characters or which begin with numbers by single quotes.

The following table describes elements of sdselect syntax that require special handling.

Syntax element sdselect command identification rule Example
Nested field names Do not enclose names of nested fields with quotation marks.

For more information about nested fields, see Use nested fields to identify datasets in a hierarchical structure.

GROUPBY userIdentity.userName
Flattened field names containing special characters Enclose names of flattened fields that contain non-alphanumeric characters or spaces with single quotation marks ( ' ). GROUPBY 'user.name', host

This is an example of a flattened field that looks like a nested field. The single quotes prevent the sdselect processor from trying to treat the flattened field as a nested field.

Flattned field names that begin with numeric characters Enclose names of flattened fields that start with numeric characters with single quotation marks ( ' ). '5minutes'="late"
Literal strings Enclose literal strings with double quotation marks ( " ). userName = "Gary"

In sdselect searches, you do not need to enclose field names in quotation marks when:

  • The field name begins with an alphabetic character.
  • The field name does not contain special characters or spaces.

GROUPBY and ORDERBY event sort interoperation

If you use the GROUPBY clause and the ORDERBY clause in an sdselect search, the sdselect command sorts search results in the following sequence:

  1. By the values of the fields you specify in the ORDERBY clause, in the order in which you list the fields. For fields that have the ASC modifier or no modifier, the sdselect command sorts results by the field values in ascending order. For fields that have the DESC modifier, the sdselect command sorts results by the field values in descending order.
  2. In ascending order by the values of any GROUPBY fields that are not specified in the ORDERBY clause, in the order in which you list the fields.

If an sdselect search uses only the GROUPBY clause, sdselect sorts search results in ascending order by values of the fields associated with the GROUPBY clause, following the list order of the fields.

Control the maximum number of returned results if a LIMIT clause is not present

If the LIMIT clause is not present in an sdselect search string, the max_number_of_results setting in limits.conf determines the maximum number of results that the search can return. The max_number_of_results setting defaults to 100,000 results.

If you use Splunk Enterprise you can remove constraints on the number of results that sdselect searches can return by setting max_number_of_results to 0.

Unbounded sdselect searches over large datasets can be expensive in terms of the amount of your data scan entitlement they might consume.

If a LIMIT clause is present in an sdselect search, it overrides the max_number_of_results setting for that search.

Splunk Cloud Platform users who want to change the max_number_of_results setting must contact Splunk Support to do so.

Ensure consistent sdselect output when the number of matched results exceeds set limits

When an sdselect search matches a number of results that exceeds an explicit LIMIT clause in the search string or the implicit limit set by the max_number_of_results setting when a LIMIT clause is not present, it is possible that repeated runs of that search will return different result sets.

If you encounter this issue, use one of the following methods to resolve it:

  • Design the search so that it always matches a number of results that is smaller than the LIMIT clause or the default limit, if no LIMIT clause is present. For example, you might reduce the number of matched results by running the search over a smaller time range, or by designing a more restrictive WHERE clause.
  • Add an ORDERBY clause to the search. If the search has a LIMIT clause you must put the ORDERBY clause before the LIMIT clause.

See also

sdselect command
sdselect command overview
sdselect command syntax details
sdselect command WHERE clause operations
Use time fields in sdselect searches
sdselect command examples for Amazon S3
Last modified on 27 August, 2024
sdselect command syntax details   sdselect command WHERE clause operations

This documentation applies to the following versions of Splunk Cloud Platform: 9.2.2403


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters