Splunk® Cloud Services

SPL2 Search Manual

Datasets

A dataset is a collection of data that you either want to search or that contains the results from a search. Some datasets are permanent and others are temporary. Every dataset has a specific set of native capabilities associated with it, which is referred to as the dataset kind. To specify a dataset in a search, you use the dataset name. If the dataset is in a different module, you specify the module name and the dataset name.

Permanent datasets

When you add data to the Splunk platform, the data is stored in indexes on disk. Indexes are permanent datasets. Lookups and views are other examples of permanent datasets.

Each permanent dataset within a module must have a unique name.

Temporary datasets

Most of the time when you specify a dataset in a search, you use the name of a permanent dataset. However, there are situations in which you might want to use a temporary dataset. For example, you might want to use a temporary dataset in an ad hoc search to test that the search is returning the type of results you want.

A temporary dataset is a piece of unsaved, stand-alone search processing language, version 2 (SPL2). You can use a temporary dataset anywhere that you can specify a permanent dataset.

A temporary dataset must be enclosed in square brackets ( [ ] ).

For example, consider this search:

| FROM main WHERE population > 5000000 SELECT state

Instead of specifying the main dataset, which is a permanent dataset, you can specify a dataset literal:

|FROM [ { state: "Washington", abbreviation: "WA", population: 7535591 }, { state: "California", abbreviation: "CA", population: 39557045 }, { state: "Oregon", abbreviation: "OR", population: 4190714 } ] WHERE population > 5000000 SELECT state


A dataset literal is one example of a temporary dataset. See Dataset literals.

Here are some other examples of temporary datasets:

  • Subsearches
  • Datasets created using a dataset function. See Dataset functions.
  • Search jobs

Job datasets

Most temporary datasets are unnamed datasets. One exception is a job dataset. When you run a search, a temporary job dataset is created to hold the search results. The job dataset has a search ID (sid), which is the name of the job dataset.

Dataset kinds

All datasets have a dataset kind. Each dataset kind has a specific set of native capabilities, such as filtering or aggregation.

For example, with a dataset that has the metric index kind you can perform some aggregation when you specify the dataset. However, with a dataset that has the index kind, which is an event index, you cannot perform aggregation. Instead, you must add an aggregating clause or command to perform aggregation. For example, you must use the WHERE clause in the from command or the stats command in your search.

The following table shows the built-in dataset kinds:

Kind Description
index A time-series index (tsidx) for storing event data.
metric A metrics index (msidx) for storing metric data.
lookup Reference data typically stored in a KV Store lookup.
view Reusable SPL that can be used in multiple searches.
job The materialized results of a search.
import A local alias for a dataset that is defined in another module.
destination A dataset used in pipelines to send data into.

Pipeline destination kinds

The following table lists the different kinds of destination datasets supported in the pipeline products:

Kind edgeProcessor profile ingestProcessor profile
AWS S3 Yes Yes
Index Yes  
Indexer Yes  
HEC exporter Yes  
devnull Yes Yes
Splunk observability   Yes
Splunk Cloud ingestion   Yes

Modules

Modules are used to organize resources, such as datasets and rules, into separate namespaces. Modules isolate related resources from unrelated resources. Whenever a SPL2 search is run, it is run within the context of a module. Only datasets in the same module can be accessed by SPL2.

The default module has the name "", or an empty string. Resources in this default module are like files in the root of a file system. The path for these resources is the name of the resource.

If you want to use a dataset from another module, you must create an import dataset. See Creating an import dataset.

Dataset references

When you want to specify a dataset in your search syntax, you use a dataset reference. Because you typically search datasets that are in the default module that you have access to, you refer to a dataset by the dataset name. For example, the main dataset is an index kind of dataset. Referring to the main dataset in a search would look like this:

| FROM main WHERE status=200

Sometimes you might need to refer to a dataset that is in a different module, such as the built-in datasets in the catalog module. To refer to built-in datasets that are in other modules, you must specify both the module name and the dataset name, such as catalog.metrics, ingest.events, or ingest.metrics. For example, the sourcetypes dataset is a built-in dataset that is in the catalog module. Referring to the sourcetypes dataset in a search would look like this:

| FROM catalog.datasets WHERE kind="index"

If the dataset you want to search is not in the list of built-in datasets, you must create an import dataset to reference a dataset in another module.

Creating an import dataset

To use a dataset from another module that is not a built-in dataset, you must import the remote dataset into the current module and give the dataset a local name. This is referred to as creating an import dataset.

You import a dataset through the REST API, using a POST request. For instruction on how to create the POST request, see Importing datasets in the Developer Guide on the Splunk Developer Portal.

You cannot import a view from another module.

Dataset permissions

All resources, such as datasets, have permissions associated with them that can restrict which resources are available to the SPL.

For example, even though a dataset might be defined in the same module as a search, the person running the search might not have permissions to that dataset. Dataset permissions are checked and enforced when the search is run.

See also

Related information
Dataset literals
Dataset functions
Last modified on 06 June, 2024
Understanding SPL2 namespaces   Dataset literals

This documentation applies to the following versions of Splunk® Cloud Services: current


Was this topic useful?







You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters