sdselect command usage
The sdselect
command is a report-generating command. See Command types in the Splunk Cloud Platform Search Reference.
Generating commands use a leading pipe character and must be the first command in a search.
When you use sdselect
to search an AWS Glue Data Catalog table, you must use a FROM clause to reference a federated index that maps to that AWS Glue table. This mapping lets sdselect
run an efficient statistical analysis of fields in that AWS Glue table. The AWS Glue table, in turn, represents a dataset in your Amazon S3 storage.
The sdselect
command supports path navigation syntax for nested fields representing specific datasets within a JSON array or Amazon S3 directory structure. See Use nested fields to identify datasets in a hierarchical structure.
The "sd" in sdselect
stands for "structured data".
sdselect searches consume your data scan entitlement
When you run sdselect
searches, Splunk software counts the volume of data on disk that your search scans against your total data scan entitlement. To reduce data scan entitlement consumption, construct your searches so that they do not scan more data than necessary.
Splunk software tracks the volume on disk of the data you are scanning, not the number of events that you are searching. A search of compressed data, such as data from GZIP or Parquet files, might consume less of your data scan entitlement than a similar search of uncompressed data.
For more information about obtaining and monitoring your Federated Search for Amazon S3 data scan entitlement, see About Federated Search for Amazon S3.
For more information about obtaining and monitoring your Federated Analytics data scan entitlement, see About Federated Analytics.
Federated Search for Amazon S3 and Federated Analytics count searches that you cancel against your data scan entitlement for the amount of data that has been scanned at the point of search cancellation. Failed searches do not incur charges against your data scan entitlement.
Data size limits for sdselect searches
The amount of AWS Glue table dataset data that the sdselect
command can process in a single search has the following limitations.
- The sum of a single row or column in a AWS Glue table dataset cannot exceed 32 MB. For example, you exceed this limit if a single column in a table row has a column that is 100 MB in size.
sdselect
searches have a configurable data-scanned-per-search control limit of 10 TB. If you run ansdselect
search that exceeds this limit, the search fails without incurring any consumption of your data scan entitlement. If you need to change the data-scanned-per-search limit for yoursdselect
searches, contact Splunk Support.
Use nested fields to identify datasets in a hierarchical structure
The sdselect
command supports path navigation syntax for nested fields. Such fields begin with one or more struct
type elements and end with a named field. Nested fields use dot characters ( . ) to separate the structure levels. You can use nested fields to indicate specific datasets within a JSON array or Amazon S3 directory structure. You can use these nested "dataset" fields in place of ordinary fields throughout sdselect
searches.
For example, the nested field userIdentity.type
contains possible type
field values for the Amazon IAM userIdentity
element. You might use it in the WHERE clause of an sdselect
search like this:
| sdselect count(userIdentity.type) FROM federated:myindex WHERE userIdentity.type = "AssumedRole"
You might also have other nested fields for the userIdentity
element, such as userIdentity.userName
and userIdentity.accountId
.
Nested fields cannot include special characters. For example, if you have a field named a.b.c
where b.c
is a flattened field, your sdselect
search will fail with an error message.
If you use nested fields in an sdselect
search, you cannot surround those nested fields by quotation marks. See Special handling for sdselect syntax elements.
Rename nested fields that have the same named field
When nested fields have the same named field but different preceding struct
elements, you might run into issues when you use them together in an sdselect
search. The sdselect
command treats multiple nested fields with the same named field as if they are multiple fields with the same name. When sdselect
finds multiple fields with the same name in a search string, it discards all of the fields with the same named field except for the last listed field.
For example, say you run the following search:
| sdselect netperf.asnum, network.asnum FROM federated:parquet LIMIT 100
The sdselect
command handles the nested fields netperf.asnum
and network.asnum
as if they both are named asnum
. The sdselect
command returns results only for network.asnum
, in a field titled asnum
.
You can get around the problem of having multiple nested fields with the same named field by renaming such nested fields at search time. For example, the following search renames the nested fields so that 2 asnum
fields appear in the results:
| sdselect netperf.asnum AS netperf_asnum, network.asnum AS network_asnum FROM federated:parquet LIMIT 100
Extract values from JSON material in your events
If you have JSON objects in your events, you can use the json_extract(<json>,<path>)
and json_extract_exact(<json>,<string>)
evaluation functions to extract Splunk software native type values and JSON arrays from those objects. You can apply the json_extract
and json_extract_exact
evaluation functions to the WHERE, ORDERBY, and GROUPBY clauses. You can also use them for projection.
For sdselect
searches, the json_extract
and json_extract_exact
evaluation functions behave as described in the JSON functions topic in the Search Reference, with the following restrictions:
- You cannot apply
json_extract
andjson_extract_exact
evaluation functions to aggregation functions insdselect
searches. - In
sdselect
searches,json_extract
andjson_extract_exact
require a<json>
argument and a single<path>
or<string>
argument. In other words, thesdselect
command does not supportjson_extract
without a location path or with multiple location paths, and it does not supportjson_extract_exact
without a string or with multiple strings. - Location path arguments for
json_extract
cannot use curly braces or wildcards in location paths to identify indexes when you use those evaluation functions insdselect
searches. You can use only dot ( . ) notation to navigate through nested fields in the location paths.
- For example, you can use the following instance of
json_extract
in ansdselect
search. It returns all values in theassumedRoleUser.assumedRoleId
location path of theresponseelements
JSON object.
json_extract(responseelements, "assumedRoleUser.assumedRoleId")
- But this instance, which uses curly braces and a number to return the third nested object in the
cities
JSON object, would be invalid in ansdselect
search.
json_extract(cities,"{2}")
Location <path>
arguments for json_extract
evaluation functions and <string>
arguments for json_extract_exact
evaluation functions must be surrounded by double quotation marks. When location paths are used as arguments for evaluation functions, surrounding them with double quotation marks does not identify them as literal strings.
Special handling for sdselect syntax elements
The sdselect
command expects you to handle certain syntax elements in specific ways. For example, you must never surround nested fields with single or double quote characters. On the other hand, you must surround flattened fields which contain special characters or which begin with numbers by single quotes.
The following table describes elements of sdselect
syntax that require special handling.
Syntax element | sdselect command identification rule | Example |
---|---|---|
Nested field names | Do not enclose names of nested fields with quotation marks. For more information about nested fields, see Use nested fields to identify datasets in a hierarchical structure. |
GROUPBY userIdentity.userName
|
Flattened field names containing special characters | Enclose names of flattened fields that contain non-alphanumeric characters or spaces with single quotation marks ( ' ). | GROUPBY 'user.name', host This is an example of a flattened field that looks like a nested field. The single quotes prevent the |
Flattned field names that begin with numeric characters | Enclose names of flattened fields that start with numeric characters with single quotation marks ( ' ). | '5minutes'="late"
|
Literal strings | Enclose literal strings with double quotation marks ( " ). | userName = "Gary"
|
Location path arguments for json_extract evaluation functions
|
Enclose with double quotation marks ( " ). | json_extract(responseparts, "RoleUser.RoleId")
|
In sdselect
searches, you do not need to enclose field names in quotation marks when:
- The field name begins with an alphabetic character.
- The field name does not contain special characters or spaces.
GROUPBY and ORDERBY event sort interoperation
If you use the GROUPBY clause and the ORDERBY clause in an sdselect
search, the sdselect
command sorts search results in the following sequence:
- By the values of the fields or evaluation functions you specify in the ORDERBY clause, in the order in which you list the fields or evaluation functions. For fields or evaluation functions that have the ASC modifier or no modifier, the
sdselect
command sorts results by the field values in ascending order. For fields or evaluation functions that have the DESC modifier, thesdselect
command sorts results by the field values in descending order. - In ascending order by the values of any GROUPBY fields that are not specified in the ORDERBY clause, in the order in which you list the fields.
If an sdselect
search uses only the GROUPBY clause, sdselect
sorts search results in ascending order by values of the fields and evaluation functions associated with the GROUPBY clause, following the list order of the fields and evaluation functions.
Control the maximum number of returned results if a LIMIT clause is not present
If the LIMIT clause is not present in an sdselect
search string, the max_number_of_results
setting in limits.conf determines the maximum number of results that the search can return. The max_number_of_results
setting defaults to 100,000 results.
If you use Splunk Enterprise you can remove constraints on the number of results that sdselect
searches can return by setting max_number_of_results
to 0.
Unbounded sdselect
searches over large datasets can be expensive in terms of the amount of your data scan entitlement they might consume.
If a LIMIT clause is present in an sdselect
search, it overrides the max_number_of_results
setting for that search.
Splunk Cloud Platform users who want to change the max_number_of_results
setting must contact Splunk Support to do so.
Ensure consistent sdselect output when the number of matched results exceeds set limits
When an sdselect
search matches a number of results that exceeds an explicit LIMIT clause in the search string or the implicit limit set by the max_number_of_results
setting when a LIMIT clause is not present, it is possible that repeated runs of that search will return different result sets.
If you encounter this issue, use one of the following methods to resolve it:
- Design the search so that it always matches a number of results that is smaller than the LIMIT clause or the default limit, if no LIMIT clause is present. For example, you might reduce the number of matched results by running the search over a smaller time range, or by designing a more restrictive WHERE clause.
- Add an ORDERBY clause to the search. If the search has a LIMIT clause you must put the ORDERBY clause before the LIMIT clause.
See also
sdselect command syntax details | sdselect command WHERE clause operations |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.2.2406 (latest FedRAMP release), 9.3.2408
Feedback submitted, thanks!