Splunk® Enterprise

Search Manual

Splunk Enterprise version 8.0 is no longer supported as of October 22, 2021. See the Splunk Software Support Policy for details. For information about upgrading to a supported version, see How to upgrade Splunk Enterprise.

Types of commands

As you learn about Splunk SPL, you might hear the terms streaming, generating, transforming, orchestrating, and data processing used to describe the types of search commands. This topic explains what these terms mean and lists the commands that fall into each category.

There are six broad categorizations for almost all of the search commands:

  • distributable streaming
  • centralized streaming
  • transforming
  • generating
  • orchestrating
  • dataset processing

These categorizations are not mutually exclusive. Some commands fit into only one categorization. The stats command is an example of a command that fits only into the transforming categorization. Other commands can fit into multiple categorizations. For example a command can be streaming and also generating.

For a complete list of commands that are in each type, see Command types in the Search Reference.

Why the types of commands matter

Although it can be easy to get confused by the different categories of commands, having a solid understanding of the differences between types of commands will help you understand the implications for how and where data is processed, and optimize the performance of your searches.

For example, suppose you have a search that uses the following commands in this order:

search... | lookup... | where... | eval... | sort... | where... |...

The first 4 commands, from the search to eval commands, are distributable streaming commands that can all be processed on the indexers. As a result, when the search is run, the search head pushes the search to the indexers.

Since the sort command is not a distributable streaming command and needs all of the events in one place, the events that are returned from the first 4 commands are then sent back to the search head for sorting. As a result, the rest of the search after the sort command must also be processed on the search head. This is true even if the commands that follow sort are distributable streaming commands, like the second where command in the search.

Once search processing moves to the search head, it can't be moved back to the indexer. With this in mind, you should put non-streaming commands as late as possible in your searches to make them run efficiently. To find out more about how the types of commands used in searches can affect performance, see Write better searches.

Streaming and non-streaming commands

A streaming command operates on each event as it is returned by a search. Essentially one event in and one (or no) event out.

This diagram shows individual events being processed by a streaming command, one event after another.

For example, the eval command can create a new field, full_name, to contain the concatenation of the value in the first_name field, a space, and the value in the last_name field.

... | eval full_name = first_name." ".last_name

The eval command evaluates each event without considering the other events.

A non-streaming command requires the events from all of the indexers before the command can operate on the entire set of events. Many transforming commands are non-streaming commands. There are also several commands that are not transforming commands but are also non-streaming. These non-transforming, non-streaming commands are most often dataset processing commands.

This diagram shows a set of events that are collected and then processed together by a non-streaming command.

For example, before the sort command can begin to sort the events, the entire set of events must be received by the sort command. Other examples of non-streaming commands include dedup (in some modes), stats, and top.

Non-streaming commands force the entire set of events to the search head. This requires a lot of data movement and a loss of parallelism.

For information on how to mitigate the cost of non-streaming commands, see Write better searches in this manual.

Processing attributes

The following table describes the processing differences between some of the types of commands.

Distributable streaming Centralized streaming Data processing (non-streaming) Transforming
Can run on indexers Y N N N
Can output before final input Y Y N N
Outputs events if inputs are events Y Y Y N

When a command is run it outputs either events or results, based on the type of command. For example, when you run the sort command, the input is events and the output is events in the sort order you specify. However, transforming commands do not output events. Transforming commands output results. For example the stats command outputs a table of calculated results. The events used to calculate those results are no longer available. After you run a transforming command, you can't run a command that expects events as an input.

Data processing commands are non-streaming commands that require the entire dataset before the command can run. These commands are not transforming, not distributable, not streaming, and not orchestrating. The sort command is an example of a data processing command. See Data processing commands.

Distributable streaming

A streaming command operates on each event returned by a search. For distributable streaming, the order of the events does not matter. A distributable streaming command is a command that can be run on the indexer, which improves processing time. The other commands in a search determine if the distributable streaming command is run on the indexer:

  • If all of the commands before the distributable streaming command can be run on the indexer, the distributable streaming command is run on the indexer.
  • If any one of the commands before the distributable streaming command must be run on the search head, the remaining commands in the search must be run on the search head. When the search processing moves to the search head, it can't be moved back to the indexer.

Distributable streaming commands can be applied to subsets of indexed data in a parallel manner. For example, the rex command is streaming. It extracts fields and adds them to events at search time.

Some of the common distributable streaming commands are: eval, fields, makemv, rename, regex, replace, strcat, typer, and where.

For a complete list of distributable streaming commands, see Streaming commands in the Search Reference.

Centralized streaming

For centralized streaming commands, the order of the events matters. A centralized streaming command applies a transformation to each event returned by a search. But unlike distributable streaming commands, a centralized streaming command only works on the search head. You might also hear the term "stateful streaming" to describe these commands.

Centralized streaming commands include: head, streamstats, some modes of dedup, and some modes of cluster.

Transforming

A transforming command orders the search results into a data table. These commands "transform" the specified cell values for each event into numerical values that Splunk software can use for statistical purposes. Transforming commands are not streaming. Also, transforming commands are required to transform search result data into the data structures that are required for visualizations such as column, bar, line, area, and pie charts.

Transforming commands include: chart, timechart, stats, top, rare, and addtotals when it is used to calculate column totals (not row totals).

For more information about transforming commands and their role in create statistical tables and chart visualizations, see About transforming commands and searches in the this manual.

For a complete list of transforming commands, see Transforming commands in the Search Reference.

Generating

A generating command returns information or generates results. Some generating commands can return information from an index, a data model, a lookup, or a CSV file without any transformations to the information.  Other generating commands generate results, usually for testing purposes.

Generating commands are either event-generating (distributable or centralized) or report-generating. Most report-generating commands are also centralized. Depending on which type the command is, the results are returned in a list or a table.

Generating commands do not expect or require an input. Generating commands are usually invoked at the beginning of the search and with a leading pipe. That is, there cannot be a search piped into a generating command. The exception to this is the search command, because it is implicit at the start of a search and does not need to be invoked.

Examples of generating commands include: dbinspect, datamodel, inputcsv, inputlookup, makeresults, metadata, pivot, search, and tstats

For a complete list of generating commands, see Generating commands in the Search Reference.

Orchestrating

An orchestrating command is a command that controls some aspect of how the search is processed. It does not directly affect the final result set of the search. For example, you might apply an orchestrating command to a search to enable or disable a search optimization that helps the overall search complete faster.

Examples of orchestrating commands include redistribute, noop, and localop. The lookup command also becomes an orchestrating command when you use it with the local=t argument.

Dataset processing

There are a handful of commands that require the entire dataset before the command can run. These commands are referred to as dataset processing commands. These commands are not transforming, not distributable, not streaming, and not orchestrating. Some of these commands fit into other types in specific situations or when specific arguments are used.

Examples of data processing commands include: sort, eventstats, and some modes of cluster, dedup, and fillnull.

For a complete list of dataset processing commands, see Dataset processing commands in the Search Reference.

Last modified on 01 August, 2024
Types of searches   Search with Splunk Web, CLI, or REST API

This documentation applies to the following versions of Splunk® Enterprise: 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.1.6, 7.1.7, 7.1.8, 7.1.9, 7.1.10, 7.2.0, 7.2.1, 7.2.2, 7.2.3, 7.2.4, 7.2.5, 7.2.6, 7.2.7, 7.2.8, 7.2.9, 7.2.10, 7.3.0, 7.3.1, 7.3.2, 7.3.3, 7.3.4, 7.3.5, 7.3.6, 7.3.7, 7.3.8, 7.3.9, 8.0.0, 8.0.1, 8.0.2, 8.0.3, 8.0.4, 8.0.5, 8.0.6, 8.0.7, 8.0.8, 8.0.9, 8.0.10, 8.1.0, 8.1.1, 8.1.2, 8.1.3, 8.1.4, 8.1.5, 8.1.6, 8.1.7, 8.1.8, 8.1.9, 8.1.10, 8.1.11, 8.1.12, 8.1.13, 8.1.14, 8.2.0, 8.2.1, 8.2.2, 8.2.3, 8.2.4, 8.2.5, 8.2.6, 8.2.7, 8.2.8, 8.2.9, 8.2.10, 8.2.11, 8.2.12, 9.0.0, 9.0.1, 9.0.2, 9.0.3, 9.0.4, 9.0.5, 9.0.6, 9.0.7, 9.0.8, 9.0.9, 9.0.10, 9.1.0, 9.1.1, 9.1.2, 9.1.3, 9.1.4, 9.1.5, 9.1.6, 9.2.0, 9.2.1, 9.2.2, 9.2.3, 9.3.0, 9.3.1


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters