Datasets
A dataset is a collection of data that you either want to search or that contains the results from a search. Some datasets are permanent and others are temporary. Every dataset has a specific set of native capabilities associated with it, which is referred to as the dataset kind. To specify a dataset in a search, you use the dataset name. If the dataset is in a different module, you specify the module name and the dataset name.
Permanent datasets
When you add data to the Splunk platform, the data is stored in indexes on disk. Indexes are permanent datasets. Lookups and views are other examples of permanent datasets.
Each permanent dataset within a module must have a unique name.
Temporary datasets
Most of the time when you specify a dataset in a search, you use the name of a permanent dataset. However, there are situations in which you might want to use a temporary dataset. For example, you might want to use a temporary dataset in an ad hoc search to test that the search is returning the type of results you want.
A temporary dataset is a piece of unsaved, stand-alone search processing language, version 2 (SPL2). You can use a temporary dataset anywhere that you can specify a permanent dataset.
A temporary dataset must be enclosed in square brackets ( [ ] ).
For example, consider this search:
| FROM main WHERE population > 5000000 SELECT state
Instead of specifying the main
dataset, which is a permanent dataset, you can specify a dataset literal:
|FROM
[
{ state: "Washington", abbreviation: "WA", population: 7535591 },
{ state: "California", abbreviation: "CA", population: 39557045 },
{ state: "Oregon", abbreviation: "OR", population: 4190714 }
]
WHERE population > 5000000 SELECT state
A dataset literal is one example of a temporary dataset. See Dataset literals.
Here are some other examples of temporary datasets:
- Subsearches
- Datasets created using a dataset function. See Dataset functions.
- Search jobs
Job datasets
Most temporary datasets are unnamed datasets. One exception is a job dataset. When you run a search, a temporary job dataset is created to hold the search results. The job dataset has a search ID (sid), which is the name of the job dataset.
Dataset kinds
All datasets have a dataset kind. Each dataset kind has a specific set of native capabilities, such as filtering or aggregation.
For example, with a dataset that has the metric
index kind you can perform some aggregation when you specify the dataset. However, with a dataset that has the index
kind, which is an event index, you cannot perform aggregation. Instead, you must add an aggregating clause or command to perform aggregation. For example, you must use the WHERE
clause in the from
command or the stats
command in your search.
The following table shows the built-in dataset kinds:
Kind | Description |
---|---|
index | A time-series index (tsidx) for storing event data. |
metric | A metrics index (msidx) for storing metric data. |
lookup | Reference data typically stored in a KV Store lookup. |
view | Reusable SPL that can be used in multiple searches. |
job | The materialized results of a search. |
import | A local alias for a dataset that is defined in another module. |
destination | A dataset used in pipelines to send data into. |
Pipeline destination kinds
The following table lists the different kinds of destination datasets supported in the pipeline products:
Kind | edgeProcessor profile | ingestProcessor profile |
---|---|---|
AWS S3 | Yes | Yes |
Index | Yes | |
Indexer | Yes | |
HEC exporter | Yes | |
devnull | Yes | Yes |
Splunk observability | Yes | |
Splunk Cloud ingestion | Yes |
Modules
Modules are used to organize resources, such as datasets and rules, into separate namespaces. Modules isolate related resources from unrelated resources. Whenever a SPL2 search is run, it is run within the context of a module. Only datasets in the same module can be accessed by SPL2.
The default module has the name "", or an empty string. Resources in this default module are like files in the root of a file system. The path for these resources is the name of the resource.
If you want to use a dataset from another module, you must create an import dataset. See Creating an import dataset.
Dataset references
When you want to specify a dataset in your search syntax, you use a dataset reference. Because you typically search datasets that are in the default module that you have access to, you refer to a dataset by the dataset name. For example, the main
dataset is an index kind of dataset. Referring to the main
dataset in a search would look like this:
| FROM main WHERE status=200
Sometimes you might need to refer to a dataset that is in a different module, such as the built-in datasets in the catalog module. To refer to built-in datasets that are in other modules, you must specify both the module name and the dataset name, such as catalog.metrics
, ingest.events
, or ingest.metrics
. For example, the sourcetypes
dataset is a built-in dataset that is in the catalog module. Referring to the sourcetypes
dataset in a search would look like this:
| FROM catalog.datasets WHERE kind="index"
If the dataset you want to search is not in the list of built-in datasets, you must create an import dataset to reference a dataset in another module.
Creating an import dataset
To use a dataset from another module that is not a built-in dataset, you must import the remote dataset into the current module and give the dataset a local name. This is referred to as creating an import dataset.
You import a dataset through the REST API, using a POST request. For instruction on how to create the POST request, see Importing datasets in the Developer Guide on the Splunk Developer Portal.
You cannot import a view from another module.
Dataset permissions
All resources, such as datasets, have permissions associated with them that can restrict which resources are available to the SPL.
For example, even though a dataset might be defined in the same module as a search, the person running the search might not have permissions to that dataset. Dataset permissions are checked and enforced when the search is run.
See also
- Related information
- Dataset literals
- Dataset functions
Understanding SPL2 namespaces | Dataset literals |
This documentation applies to the following versions of Splunk® Cloud Services: current
Feedback submitted, thanks!