from command examples
The following are examples for using the SPL2 from
command.
To learn more about the from
command, see How the SPL2 from command works.
You can specify the clauses in the from
command in uppercase or lowercase. These examples use uppercase for readability.
Some of these examples start with the SELECT clause and others start with the FROM clause. Both of these clauses are valid syntax for the from
command when it is used in a search.
1. Specify string values in quotations
The following search shows that string values in field-value pairs must be enclosed in double quotation marks.
FROM my_index sourcetype="syslog" ...
Because string values must be enclosed in double quotation marks, you can reverse the order of field-value pairs. For example, the previous search can also be specified this way:
FROM my_index "syslog"=sourcetype ...
2. Search a metric index
The following search looks for data in the _metrics
index:
SELECT earliest_time(_value), metric_name
FROM _metrics
WHERE like(metric_name, "deploy%")
GROUP BY metric_name
To use a wildcard in the WHERE clause, you cannot use the asterisk ( * ) wildcard character. You must use the like
function. See Comparison and Conditional functions.
3. Search multiple indexes
The following search looks for data in the EMEA
and APAC
indexes:
FROM indexes(EMEA, APAC)
WHERE count(orders) > 1000
GROUP BY country
4. Pipeline examples
These examples show how to use the from
command in a pipeline.
Select data to be processed in a pipeline
The following pipeline selects a subset of the data received by the Edge Processor or Ingest Processor and then sends it to a destination, such as an index or Amazon S3 bucket. The $source
parameter refers to an internal dataset in the Edge Processor or Ingest Processor.
$pipeline = | from $source | into $destination
Mask sensitive information using a pipeline
The following pipeline selects a subset of the data received by the Edge Processor or Ingest Processor, replaces the credit card numbers in the _raw field with the word "<redacted>", and then sends the masked data to a destination. The $source parameter refers to an internal dataset in the Edge Processor or Ingest Processor.
$pipeline = | from $source | eval _raw=replace(_raw, /[1-5][0-9]{15}/i, "<redacted>") | into $destination
Filter extracted data in pipelines
The following example filters a set of audit logs so that only failed login attempts remain. First, the record types and result values must be extracted from the logs.
$pipeline = | from $source | rex field=_raw /type=(?P<RecordType>[A-Z_]+).*res=(?P<Result>\w+)/ | where RecordType = "USER_LOGIN" | where Result = "failed" | fields - RecordType, Result | into $destination
5. Search using wildcards
You can use a wildcard character ( * ) in the SELECT clause to search for similar field names. You must enclose the wildcard syntax in single quotation marks. For example:
SELECT 'host*' FROM main ...
You can use a wildcard to search for only internal fields, which begin with an underscore ( _ ) character . For example:
FROM main SELECT '_*'
The WHERE clause does not support the wildcard character ( * ). However you can use the like
function to perform a wildcard search. For example:
FROM main
WHERE ipaddress LIKE "198.%"...
The like
function supports several syntaxes, see Comparison and Conditional functions.
6. Specify multiple expressions in the WHERE clause
Use the WHERE clause to filter the data by specifying one or more expressions. You need to separate multiple expressions using logical operators, such as AND and OR.
FROM index=_internal
WHERE like(source, "%license%") AND type="usage"
| stats sum(b) BY idx
The WHERE clause uses the like
function to perform a search with wildcard. The WHERE clause does not support the asterisk ( * ) wildcard character. For more information about the like
function, see Comparison and Conditional functions.
For more information about logical operators, see Predicate expressions in the SPL2 Search Manual.
7. Search for multiple terms in events
You can search for multiple terms in your events by using a search literal in the WHERE clause. An AND operator is implied between the terms specified in the search literal. To specify a search literal, you enclose the list of terms in backtick characters ( ` ).
The following search looks for the terms invalid
AND user
AND sshd[5258]
and returns the events that contain all three terms:
SELECT _time, source
FROM main
WHERE `invalid user sshd[5258]`
For more information, see Search literals in expressions in the SPL2 Search Manual.
8. Specify a single field in the GROUP BY clause
You can specify one or more fields to group by. In this example a single field, host
, is specified.
When using the from
command, if the GROUP BY clause is specified, the SELECT clause must also be specified. The SELECT clause must contain either an aggregation or the fields in the GROUP BY clause. In this example, the SELECT clause contains the aggregation avg(cpu_usage)
:
SELECT avg(cpu_usage) AS 'Avg Usage'
FROM my_index
WHERE sourcetype="syslog"
GROUP BY host
9. Specify a time span in the GROUP BY clause
You can arrange search results in groups using a time span.
When using the from
command, if the GROUP BY clause is specified, the SELECT clause must also be specified.
The following search returns web access error information, grouped by host
in 5 minute time spans.
SELECT count(), host, _time
FROM index
WHERE sourcetype="webaccess" AND `ERROR`
GROUP BY host, span(_time, 5m)
There are several ways to specify a time span with the GROUP BY clause, see from command syntax details.
10. Sorting search results using the ORDER BY clause
Suppose you use the following search to return count of the actions taken, grouped by the productId
field.
FROM sample_data_index
WHERE status=200 AND host="www4"
GROUP BY productId
SELECT count(action), productId
The results look something like this:
productId | count(action) |
---|---|
DC-SG-G02 | 12 |
FS-SG-G03 | 10 |
MB-AG-G07 | 17 |
PZ-SG-G05 | 4 |
SF-BVS-G01 | 11 |
SF-BVS-T01 | 6 |
WC-SH-G04 | 2 |
WC-SH-T02 | 15 |
By default the results are sorted on the GROUP BY field, productId
.
You want to sort the results in descending order based on the count. However, the name of the count field in the output is the name of the aggregation specified in the SELECT clause, count(action)
. The ORDER BY clause will not sort on a field name that is an aggregation because it contains special characters, the parenthesis. You have two options, you can either rename the aggregation field count(action)
in the SELECT clause using the AS keyword, or you can enclose the field name in single quotations, such as ORDER BY 'count(action)' DESC
.
Here's the updated search using the rename option:
FROM sample_data_index
WHERE status=200 AND host="www4"
GROUP BY productId
SELECT count(action) AS Count, productId
ORDER BY Count DESC
The results look something like this:
productId | Count |
---|---|
MB-AG-G07 | 17 |
WC-SH-T02 | 15 |
DC-SG-G02 | 12 |
SF-BVS-G01 | 11 |
FS-SG-G03 | 10 |
SF-BVS-T01 | 6 |
PZ-SG-G05 | 4 |
WC-SH-G04 | 2 |
11. Enrich event data with a lookup dataset using the JOIN clause
Consider the following data from a set of events with login information:
_time | action | userID | host | port |
---|---|---|---|---|
8:00 AM 29 Nov 2021 | Failed password | patel | yangtze.buttercupgames.com | 3390 |
7:15 AM 29 Nov 2021 | Failed password | zhang | nile.example.net | 1851 |
9:30 PM 15 Nov 2021 | Session opened | dubois | danube.sample.com | 1260 |
6:11 AM 14 Nov 2021 | Failed password | sullivan | volga.example.com | 2766 |
11:20 AM 15 Nov 2021 | Failed password | martin | volga.example.com | 3622 |
08:13 AM 31 Oct 2021 | Failed password | mayer | ganger.example.com | 3658 |
11:59 PM 23 Oct 2021 | Failed password | patel | yangtze.buttercupgames.com | 1214 |
You want to enrich the event data with information from the host_info
lookup dataset, which contains information about known hosts:
hostname | kind | status | host_contact |
---|---|---|---|
mekong.buttercupgames.com | internal | allowed | alex@buttercupgames.com |
yangtze.buttercupgames.com | internal | allowed | claudia@buttercupgames.com |
danube.sample.com | supplier | allowed | martin@sample.com |
ganger.example.com | external | allowed | david@example.com |
volga.example.com | external | banned |
Specifically, you want every event that matches the search criteria to appear in the search results. If there is a match between an event and the host_info
lookup dataset, you want to display the kind
and status
from the host_info
lookup dataset with each event. This is referred to as a left join, which is shown in the following image.
The A circle represents the event dataset and the B circle represents the host_info
lookup dataset.
The following example enriches data in the main
event dataset with data from the host_info
lookup dataset, where there is a matching host name. An alias for each dataset is created using the AS clause. The WHERE clause filters out events where the host kind is not internal
. The SELECT clause specifies which fields to return. The results are organized by the host
field.
FROM main AS m
LEFT JOIN host_info AS h ON m.host=h.hostname
WHERE h.kind!="internal"
SELECT m.host, m.action, m.userID, h.kind, h.status
GROUP BY m.host
When you use the JOIN clause, the aliases you specify in the search are not propagated to the search results. For example, in this search you specified m.host
, but the search results display host
.
The results of this search are shown in the following table. As you can see, the events that have a host with a kind
of internal
, the buttercupgames.com hosts, have been removed. The results also show that there is no host information for the nile.example.net
host.
host | action | userID | kind | status | |
---|---|---|---|---|---|
danube.sample.com | Session opened | dubois | supplier | allowed | |
ganger.example.com | Failed password | mayer | external | allowed | |
nile.example.net | Failed password | zhang | |||
volga.example.com | Failed password | sullivan | external | banned | |
volga.example.com | Failed password | martin | external | banned |
12. Use consecutive JOIN clauses to return data from multiple datasets
You can create a stacked join search that uses multiple JOIN clauses to return data from multiple datasets.
Consider the following data from a set of events in the orders
dataset:
_time | clientip | action | pid | quantity |
---|---|---|---|---|
12:00:01 PM 20 Jan 2022 | 192.0.2.0 | purchase | DC-SG-G02 | 1 |
10:13:34 AM 20 Jan 2022 | 203.0.113.255 | addtochart | MB-AG-G07 | 3 |
9:55:51 AM 20 Jan 2022 | 203.0.113.0 | purchase | WC-SH-A01 | 1 |
9:21:25 AM 20 Jan 2022 | 198.51.100.255 | changequantity | PZ-SG-G05 | 2 |
9:14:17 AM 20 Jan 2022 | 192.0.2.0 | purchase | SF-BVS-01 | 1 |
8:42:23 AM 20 Jan 2022 | 198.51.100.0 | purchase | SF-BVS-G01 | 1 |
8:30:45 AM 20 Jan 2022 | 192.0.2.0 | purchase | WC-SH-T02 | 2 |
7:57:14 AM 20 Jan 2022 | 198.51.100.0 | purchase | PZ-SG-G05 | 1 |
You want to enrich the orders event data with information from the products
lookup dataset, which contains product and price information. Here is an example of the data in the products
dataset:
productId | product_name | price | sale_price | supplierId |
---|---|---|---|---|
DC-SG-G02 | Dream Crusher | 39.99 | 24.99 | 1238 |
FS-SG-G03 | Final Sequel | 24.99 | 16.99 | 5017 |
WC-SH-G04 | World of Cheese | 24.99 | 19.99 | 7024 |
WC-SH-T02 | World of Cheese Tee | 19.99 | 16.99 | 7024 |
PZ-SG-G05 | Puppies vs. Zombies | 4.99 | 3.99 | 7045 |
MB-AG-G07 | Manganiello Bros. | 38.99 | 27.99 | 4111 |
SF-BVS-G01 | Grand Theft Scooter | 26.99 | 21.99 | 5007 |
SF-BVS-01 | Pony Run | 49.99 | 41.99 | 5007 |
You want to display the product names instead of the product IDs in your search results.
You want every order event that matches the search criteria to appear in the results, even if the item ordered does not have a matching entry in the products lookup dataset. Notice that the third order contains the product ID WC-SH-A01
, which does not appear in the products lookup dataset.
You can display the product names in the search results by including a JOIN clause to your search that enriches the orders dataset with the data from the products dataset. Specifically, you need to use a LEFT JOIN to accomplish this result. The datasets are joined on the field that the datasets have in common, which is the product ID field.
Here is the search you can use to add the product names to the orders events:
FROM orders AS o
LEFT JOIN products AS p ON o.pid=p.productId
SELECT o._time, o.pid, p.product_name, o.quantity
The results look like this:
_time | pid | product_name | quantity |
---|---|---|---|
12:00:01 PM 20 Jan 2022 | DC-SG-G02 | Dream Crusher | 1 |
10:13:34 AM 20 Jan 2022 | MB-AG-G07 | Manganiello Bros. | 3 |
9:55:51 AM 20 Jan 2022 | WC-SH-A01 | 1 | |
9:21:25 AM 20 Jan 2022 | PZ-SG-G05 | Puppies vs. Zombies | 2 |
9:14:17 AM 20 Jan 2022 | SF-BVS-01 | Pony Run | 1 |
8:42:23 AM 20 Jan 2022 | SF-BVS-G01 | Grand Theft Scooter | 1 |
8:30:45 AM 20 Jan 2022 | WC-SH-T02 | World of Cheese Tee | 2 |
7:57:14 AM 20 Jan 2022 | PZ-SG-G05 | Puppies vs. Zombies | 1 |
Because there is no matching product ID for WC-SH-A01
in the products dataset, there is no product name in the search results. Using a LEFT JOIN is a way to highlight missing information from the second, or right-side, dataset.
Now you want to find out the name and city of the supplier that provides each product. You can enrich the search results data with information from the suppliers
lookup dataset, based on the supplier ID.
Here is an example of the data in the suppliers
dataset:
supplier_id | supplier_name | city | state/province | country |
---|---|---|---|---|
1009 | Mile High Games | Denver | Colorado | United States |
1237 | Area 51 Games | Roswell | New Mexico | United States |
4111 | Isthmus Pastimes | Panama City | Panama | Panama |
5007 | EuroToys | Prague | Central Bohemia | Czech Republic |
5017 | Der Kriegsspiel | Cologne | North Rhine-Westphalia | Germany |
7024 | Happy Fun Games | Kyoto | Kyoto | Japan |
7045 | Kiwi Game Warehouse | Auckland | Auckland | New Zealand |
To display the supplier names and city in your search results you need to add another JOIN clause to your search. Because you want every product in the search results returned, whether or not there is a corresponding supplier, you will use a LEFT JOIN. The products and suppliers datasets can be joined on the supplier ID field.
Looking back at the products dataset, the Dream Crusher
product has a supplier ID of 1238
, which does not appear in the suppliers dataset. The LEFT JOIN will highlight the absence of this information.
Here is the updated search:
FROM orders AS o
LEFT JOIN products AS p ON o.pid=p.productId
LEFT JOIN suppliers AS s ON p.supplierId=s.supplier_id
SELECT o._time, o.pid, p.product_name, p.supplierId, s.supplier_name, o.quantity
The results of the search look like this:
_time | pid | product_name | supplierId | supplier_name | quantity |
---|---|---|---|---|---|
12:00:01 PM 20 Jan 2022 | DC-SG-G02 | Dream Crusher | 1 | ||
10:13:34 AM 20 Jan 2022 | MB-AG-G07 | Manganiello Bros. | 4111 | Isthmus Pastimes | 3 |
9:55:51 AM 20 Jan 2022 | WC-SH-A01 | 1 | |||
9:21:25 AM 20 Jan 2022 | PZ-SG-G05 | Puppies vs. Zombies | 7045 | Kiwi Game Warehouse | 2 |
9:14:17 AM 20 Jan 2022 | SF-BVS-01 | Pony Run | 5007 | EuroToys | 1 |
8:42:23 AM 20 Jan 2022 | SF-BVS-G01 | Grand Theft Scooter | 5007 | EuroToys | 1 |
8:30:45 AM 20 Jan 2022 | WC-SH-T02 | World of Cheese Tee | 7024 | Happy Fun Games | 2 |
7:57:14 AM 20 Jan 2022 | PZ-SG-G05 | Puppies vs. Zombies | 7045 | Kiwi Game Warehouse | 1 |
13. Return data from a view
This search returns the timestamp and client IP fields from a view called mysecurityview
.
FROM mysecurityview
| fields _time, clientip ...
14. Use the HAVING clause to filter after aggregations
The following example calculate the sum of the bytes field in the main
index from events that occurred in the last 5 minutes. The results are grouped by the host
field. The sum
and the host
fields are returned, where the sum of the bytes is greater than I MB.
SELECT sum(bytes) AS sum, host
FROM main
WHERE earliest=-5m@m AND latest=@m
GROUP BY host
HAVING sum > 1024*1024
15. Specify offsets and limits
The following search returns web access error information, grouped by host and 5 minute time spans, that have a count greater than 10. The LIMIT clause is used to return up to 50 results. The OFFSET clause is used to skip the first 20 results, starting with the 21st result.
SELECT count(), host, _time
FROM index
WHERE sourcetype="webaccess" AND `ERROR`
GROUP BY host, span(_time, 5m)
HAVING count > 10
ORDER BY count desc
LIMIT 50
OFFSET 20
See also
- Pipelines
- Edge Processor pipeline syntax in the Use Edge Processors manual
- Ingest Processor pipeline syntax in the Use Ingest Processors manual
- Related information
- Types of expressions in the SPL2 Search Manual
from command usage | head command overview |
This documentation applies to the following versions of Splunk® Cloud Services: current
Feedback submitted, thanks!