sdselect command usage

The sdselect command is a report-generating command. See Command types in the Splunk Cloud Platform Search Reference.

Generating commands use a leading pipe character and must be the first command in a search.

When you use sdselect to search an AWS Glue Data Catalog table, you must use a FROM clause to reference a federated index that maps to that AWS Glue table. This mapping lets sdselect run an efficient statistical analysis of fields in that AWS Glue table. The AWS Glue table, in turn, represents a dataset in your Amazon S3 storage.

The sdselect command supports path navigation syntax for nested fields representing specific datasets within a JSON array or Amazon S3 directory structure. See Use nested fields to identify datasets in a hierarchical structure.

sdselect searches consume your data scan entitlement

When you run sdselect searches, Federated Search for Amazon S3 counts the volume of data on disk that your search scans against your total data scan entitlement. To reduce data scan entitlement consumption, construct your searches so that they do not scan more data than necessary.

Splunk software tracks the volume on disk of the data you are scanning, not the number of events that you are searching. A search of compressed data, such as data from GZIP or Parquet files, might consume less of your data scan entitlement than a similar search of uncompressed data.

For more information about obtaining and monitoring your Federated Search for Amazon S3 data scan entitlement, see About Federated Search for Amazon S3.

Federated Search for Amazon S3 counts searches that you cancel against your data scan entitlement for the amount of data that has been scanned at the point of search cancellation. Failed searches do not incur charges against your data scan entitlement.

Data size limits for sdselect searches

The amount of AWS Glue table dataset data that the sdselect command can process in a single search has the following limitations.

The sum of a single row or column in a AWS Glue table dataset cannot exceed 32 MB. For example, you exceed this limit if a single column in a table row has a column that is 100 MB in size.
sdselect searches have a configurable data-scanned-per-search control limit of 10 TB. If you run an sdselect search that exceeds this limit, the search fails without incurring any consumption of your data scan entitlement. If you need to change the data-scanned-per-search limit for your sdselect searches, contact Splunk Support.

Use nested fields to identify datasets in a hierarchical structure

The sdselect command supports path navigation syntax for nested fields. Such fields begin with one or more struct type elements and end with a named field. Nested fields use dot characters ( . ) to separate the structure levels. You can use nested fields to indicate specific datasets within a JSON array or Amazon S3 directory structure. You can use these nested "dataset" fields in place of ordinary fields throughout sdselect searches.

For example, the nested field userIdentity.type contains possible type field values for the Amazon IAM userIdentity element. You might use it in the WHERE clause of an sdselect search like this:

| sdselect count(userIdentity.type) FROM federated:myindex WHERE userIdentity.type = "AssumedRole"

You might also have other nested fields for the userIdentity element, such as userIdentity.userName and userIdentity.accountId.

Nested fields cannot include special characters. For example, if you have a field named a.b.c where b.c is a flattened field, your sdselect search will fail with an error message.

If you use nested fields in an sdselect search, you cannot surround those nested fields by quotation marks. See Special handling for sdselect syntax elements.

Rename nested fields that have the same named field

When nested fields have the same named field but different preceding struct elements, you might run into issues when you use them together in an sdselect search. The sdselect command treats multiple nested fields with the same named field as if they are multiple fields with the same name. When sdselect finds multiple fields with the same name in a search string, it discards all of the fields with the same named field except for the last listed field.

For example, say you run the following search:

| sdselect netperf.asnum, network.asnum FROM federated:parquet LIMIT 100

The sdselect command handles the nested fields netperf.asnum and network.asnum as if they both are named asnum. The sdselect command returns results only for network.asnum, in a field titled asnum.

You can get around the problem of having multiple nested fields with the same named field by renaming such nested fields at search time. For example, the following search renames the nested fields so that 2 asnum fields appear in the results:

| sdselect netperf.asnum AS netperf_asnum, network.asnum AS network_asnum FROM federated:parquet LIMIT 100

Special handling for sdselect syntax elements

The sdselect command expects you to handle certain syntax elements in specific ways. For example, you must never surround nested fields with single or double quote characters. On the other hand, you must surround flattened fields which contain special characters or which begin with numbers by single quotes.

The following table describes elements of sdselect syntax that require special handling.

Syntax element	sdselect command identification rule	Example
Nested field names	Do not enclose names of nested fields with quotation marks. For more information about nested fields, see Use nested fields to identify datasets in a hierarchical structure.	`GROUPBY userIdentity.userName`
Flattened field names containing special characters	Enclose names of flattened fields that contain non-alphanumeric characters or spaces with single quotation marks ( ' ).	`GROUPBY 'user.name', host` This is an example of a flattened field that looks like a nested field. The single quotes prevent the `sdselect` processor from trying to treat the flattened field as a nested field.
Flattned field names that begin with numeric characters	Enclose names of flattened fields that start with numeric characters with single quotation marks ( ' ).	`'5minutes'="late"`
Literal strings	Enclose literal strings with double quotation marks ( " ).	`userName = "Gary"`

In sdselect searches, you do not need to enclose field names in quotation marks when:

The field name begins with an alphabetic character.
The field name does not contain special characters or spaces.

GROUPBY and ORDERBY event sort interoperation

If you use the GROUPBY clause and the ORDERBY clause in an sdselect search, the sdselect command sorts search results in the following sequence:

By the values of the fields you specify in the ORDERBY clause, in the order in which you list the fields. For fields that have the ASC modifier or no modifier, the sdselect command sorts results by the field values in ascending order. For fields that have the DESC modifier, the sdselect command sorts results by the field values in descending order.
In ascending order by the values of any GROUPBY fields that are not specified in the ORDERBY clause, in the order in which you list the fields.

If an sdselect search uses only the GROUPBY clause, sdselect sorts search results in ascending order by values of the fields associated with the GROUPBY clause, following the list order of the fields.

Control the maximum number of returned results if a LIMIT clause is not present

If the LIMIT clause is not present in an sdselect search string, the max_number_of_results setting in limits.conf determines the maximum number of results that the search can return. The max_number_of_results setting defaults to 100,000 results.

If you use Splunk Enterprise you can remove constraints on the number of results that sdselect searches can return by setting max_number_of_results to 0.

Unbounded sdselect searches over large datasets can be expensive in terms of the amount of your data scan entitlement they might consume.

If a LIMIT clause is present in an sdselect search, it overrides the max_number_of_results setting for that search.

Splunk Cloud Platform users who want to change the max_number_of_results setting must contact Splunk Support to do so.

Ensure consistent sdselect output when the number of matched results exceeds set limits

When an sdselect search matches a number of results that exceeds an explicit LIMIT clause in the search string or the implicit limit set by the max_number_of_results setting when a LIMIT clause is not present, it is possible that repeated runs of that search will return different result sets.

If you encounter this issue, use one of the following methods to resolve it:

Design the search so that it always matches a number of results that is smaller than the LIMIT clause or the default limit, if no LIMIT clause is present. For example, you might reduce the number of matched results by running the search over a smaller time range, or by designing a more restrictive WHERE clause.
Add an ORDERBY clause to the search. If the search has a LIMIT clause you must put the ORDERBY clause before the LIMIT clause.

Related answers from Splunk Community

sdselect command usage

sdselect searches consume your data scan entitlement

Data size limits for sdselect searches

Use nested fields to identify datasets in a hierarchical structure

Rename nested fields that have the same named field

Special handling for sdselect syntax elements

GROUPBY and ORDERBY event sort interoperation

Control the maximum number of returned results if a LIMIT clause is not present

Ensure consistent sdselect output when the number of matched results exceeds set limits

See also

Comments

sdselect command usage

Was this topic useful?