sdselect command usage

The sdselect command is a report-generating command. See Command types in the Splunk Cloud Platform Search Reference.

Generating commands use a leading pipe character and must be the first command in a search.

When you use sdselect to search an AWS Glue Data Catalog table, you must use a FROM clause to reference a federated index that maps to that AWS Glue table. This mapping lets sdselect run an efficient statistical analysis of fields in that AWS Glue table. The AWS Glue table, in turn, represents a dataset in your Amazon S3 storage.

The sdselect command supports path navigation syntax for nested fields representing specific datasets within a JSON array or Amazon S3 directory structure. See Use nested fields to identify datasets in a hierarchical structure.

The "sd" in sdselect stands for "structured data".

sdselect searches consume your data scan entitlement

When you run sdselect searches, Splunk software counts the volume of data on disk that your search scans against your total data scan entitlement. To reduce data scan entitlement consumption, construct your searches so that they do not scan more data than necessary.

Splunk software tracks the volume on disk of the data you are scanning, not the number of events that you are searching. A search of compressed data, such as data from GZIP or Parquet files, might consume less of your data scan entitlement than a similar search of uncompressed data.

For more information about obtaining and monitoring your Federated Search for Amazon S3 data scan entitlement, see About Federated Search for Amazon S3.

For more information about obtaining and monitoring your Federated Analytics data scan entitlement, see About Federated Analytics.

Federated Search for Amazon S3 and Federated Analytics count searches that you cancel against your data scan entitlement for the amount of data that has been scanned at the point of search cancellation. Failed searches do not incur charges against your data scan entitlement.

Data size limits for sdselect searches

The amount of AWS Glue table dataset data that the sdselect command can process in a single search has the following limitations.

The sum of a single row or column in a AWS Glue table dataset cannot exceed 32 MB. For example, you exceed this limit if a single column in a table row has a column that is 100 MB in size.
sdselect searches have a configurable data-scanned-per-search control limit of 10 TB. If you run an sdselect search that exceeds this limit, the search fails without incurring any consumption of your data scan entitlement. If you need to change the data-scanned-per-search limit for your sdselect searches, contact Splunk Support.

Use nested fields to identify datasets in a hierarchical structure

The sdselect command supports path navigation syntax for nested fields. Such fields begin with one or more struct type elements and end with a named field. Nested fields use dot characters ( . ) to separate the structure levels. You can use nested fields to indicate specific datasets within a JSON array or Amazon S3 directory structure. You can use these nested "dataset" fields in place of ordinary fields throughout sdselect searches.

For example, the nested field userIdentity.type contains possible type field values for the Amazon IAM userIdentity element. You might use it in the WHERE clause of an sdselect search like this:

| sdselect count(userIdentity.type) FROM federated:myindex WHERE userIdentity.type = "AssumedRole"

You might also have other nested fields for the userIdentity element, such as userIdentity.userName and userIdentity.accountId.

Nested fields cannot include special characters. For example, if you have a field named a.b.c where b.c is a flattened field, your sdselect search will fail with an error message.

If you use nested fields in an sdselect search, you cannot surround those nested fields by quotation marks. See Special handling for sdselect syntax elements.

Rename nested fields that have the same named field

When nested fields have the same named field but different preceding struct elements, you might run into issues when you use them together in an sdselect search. The sdselect command treats multiple nested fields with the same named field as if they are multiple fields with the same name. When sdselect finds multiple fields with the same name in a search string, it discards all of the fields with the same named field except for the last listed field.

For example, say you run the following search:

| sdselect netperf.asnum, network.asnum FROM federated:parquet LIMIT 100

The sdselect command handles the nested fields netperf.asnum and network.asnum as if they both are named asnum. The sdselect command returns results only for network.asnum, in a field titled asnum.

You can get around the problem of having multiple nested fields with the same named field by renaming such nested fields at search time. For example, the following search renames the nested fields so that 2 asnum fields appear in the results:

| sdselect netperf.asnum AS netperf_asnum, network.asnum AS network_asnum FROM federated:parquet LIMIT 100

Extract values from JSON material in your events

If you have JSON objects in your events, you can use the json_extract(<json>,<path>) and json_extract_exact(<json>,<string>) evaluation functions to extract Splunk software native type values and JSON arrays from those objects. You can apply the json_extract and json_extract_exact evaluation functions to the WHERE, ORDERBY, and GROUPBY clauses. You can also use them for projection.

For sdselect searches, the json_extract and json_extract_exact evaluation functions behave as described in the JSON functions topic in the Search Reference, with the following restrictions:

You cannot apply json_extract and json_extract_exact evaluation functions to aggregation functions in sdselect searches.
In sdselect searches, json_extract and json_extract_exact require a <json> argument and a single <path> or <string> argument. In other words, the sdselect command does not support json_extract without a location path or with multiple location paths, and it does not support json_extract_exact without a string or with multiple strings.
Location path arguments for json_extract cannot use curly braces or wildcards in location paths to identify indexes when you use those evaluation functions in sdselect searches. You can use only dot ( . ) notation to navigate through nested fields in the location paths.

For example, you can use the following instance of json_extract in an sdselect search. It returns all values in the assumedRoleUser.assumedRoleId location path of the responseelements JSON object.

json_extract(responseelements, "assumedRoleUser.assumedRoleId")

But this instance, which uses curly braces and a number to return the third nested object in the cities JSON object, would be invalid in an sdselect search.

json_extract(cities,"{2}")

Location <path> arguments for json_extract evaluation functions and <string> arguments for json_extract_exact evaluation functions must be surrounded by double quotation marks. When location paths are used as arguments for evaluation functions, surrounding them with double quotation marks does not identify them as literal strings.

Special handling for sdselect syntax elements

The sdselect command expects you to handle certain syntax elements in specific ways. For example, you must never surround nested fields with single or double quote characters. On the other hand, you must surround flattened fields which contain special characters or which begin with numbers by single quotes.

The following table describes elements of sdselect syntax that require special handling.

Syntax element	sdselect command identification rule	Example
Nested field names	Do not enclose names of nested fields with quotation marks. For more information about nested fields, see Use nested fields to identify datasets in a hierarchical structure.	`GROUPBY userIdentity.userName`
Flattened field names containing special characters	Enclose names of flattened fields that contain non-alphanumeric characters or spaces with single quotation marks ( ' ).	`GROUPBY 'user.name', host` This is an example of a flattened field that looks like a nested field. The single quotes prevent the `sdselect` processor from trying to treat the flattened field as a nested field.
Flattned field names that begin with numeric characters	Enclose names of flattened fields that start with numeric characters with single quotation marks ( ' ).	`'5minutes'="late"`
Literal strings	Enclose literal strings with double quotation marks ( " ).	`userName = "Gary"`
Location path arguments for `json_extract` evaluation functions	Enclose with double quotation marks ( " ).	`json_extract(responseparts, "RoleUser.RoleId")`

In sdselect searches, you do not need to enclose field names in quotation marks when:

The field name begins with an alphabetic character.
The field name does not contain special characters or spaces.

GROUPBY and ORDERBY event sort interoperation

If you use the GROUPBY clause and the ORDERBY clause in an sdselect search, the sdselect command sorts search results in the following sequence:

By the values of the fields or evaluation functions you specify in the ORDERBY clause, in the order in which you list the fields or evaluation functions. For fields or evaluation functions that have the ASC modifier or no modifier, the sdselect command sorts results by the field values in ascending order. For fields or evaluation functions that have the DESC modifier, the sdselect command sorts results by the field values in descending order.
In ascending order by the values of any GROUPBY fields that are not specified in the ORDERBY clause, in the order in which you list the fields.

If an sdselect search uses only the GROUPBY clause, sdselect sorts search results in ascending order by values of the fields and evaluation functions associated with the GROUPBY clause, following the list order of the fields and evaluation functions.

Control the maximum number of returned results if a LIMIT clause is not present

If the LIMIT clause is not present in an sdselect search string, the max_number_of_results setting in limits.conf determines the maximum number of results that the search can return. The max_number_of_results setting defaults to 100,000 results.

If you use Splunk Enterprise you can remove constraints on the number of results that sdselect searches can return by setting max_number_of_results to 0.

Unbounded sdselect searches over large datasets can be expensive in terms of the amount of your data scan entitlement they might consume.

If a LIMIT clause is present in an sdselect search, it overrides the max_number_of_results setting for that search.

Splunk Cloud Platform users who want to change the max_number_of_results setting must contact Splunk Support to do so.

Ensure consistent sdselect output when the number of matched results exceeds set limits

When an sdselect search matches a number of results that exceeds an explicit LIMIT clause in the search string or the implicit limit set by the max_number_of_results setting when a LIMIT clause is not present, it is possible that repeated runs of that search will return different result sets.

If you encounter this issue, use one of the following methods to resolve it:

Design the search so that it always matches a number of results that is smaller than the LIMIT clause or the default limit, if no LIMIT clause is present. For example, you might reduce the number of matched results by running the search over a smaller time range, or by designing a more restrictive WHERE clause.
Add an ORDERBY clause to the search. If the search has a LIMIT clause you must put the ORDERBY clause before the LIMIT clause.

Related answers from Splunk Community

sdselect command usage

sdselect searches consume your data scan entitlement

Data size limits for sdselect searches

Use nested fields to identify datasets in a hierarchical structure

Rename nested fields that have the same named field

Extract values from JSON material in your events

Special handling for sdselect syntax elements

GROUPBY and ORDERBY event sort interoperation

Control the maximum number of returned results if a LIMIT clause is not present

Ensure consistent sdselect output when the number of matched results exceeds set limits

See also

Comments

sdselect command usage

Was this topic useful?