Dataset extension is a way to create a search, report, dataset, or other object that is built upon a reference to an existing dataset. This reference means that the object always refers to the original dataset for its foundational data. If the definition of the original dataset changes, those changes are passed down to any datasets that extend it.
Dataset extension is not the same as dataset cloning. When you clone a dataset, you create a distinct, individual dataset that is identical to the original dataset but not otherwise connected to it. When you extend a dataset, you create a dataset, report, dashboard panel, or alert that is bound to the original dataset through its reference to that dataset.
Example of extending a dataset as a report
For example, you have a dataset named Alpha. If you select Explore > Investigate in Search on the Datasets listing page for the Alpha dataset, you go to the Search view and run a search that displays the contents of Alpha. This search string uses the
from command to reference Alpha. You can optionally modify the search string with additional Splunk Search Processing Language (SPL).
If you save this search string as a report named Beta, it will still have the reference back to Alpha. This means that if someone decides to make a change to Alpha, that change cascades down to the Beta report. This might cause problems in the Beta report.
For example, you might modify the search string of the Beta report with lookups and eval expressions that use fields passed down from the Alpha dataset in their definitions. If someone deletes those fields from the Alpha dataset, those lookups and eval expressions will break in the Beta report, because they require fields that no longer exist.
Dataset extension chains
If you have the Splunk Datasets Add-on installed, you can extend any dataset as a table dataset. This means that you can have chains of extended datasets. For example you can extend Dataset Alpha as dataset Beta, and then extend dataset Beta as dataset Gamma, and so on. Any change to Alpha will propagate down through the other datasets in the chain.
The Splunk Datasets Add-on enables you to understand dataset extension chains from the end of the chain, but not from the start. So to use the example in the preceding paragraph, if you are on dataset Gamma, you can see that it extends Beta, which in turn extends Alpha. But if you are looking at Alpha, you have no way of knowing which datasets were extended from it.
To learn which datasets a dataset extends
Locate the dataset in the Datasets listing page and expand its row. If it extends one or more datasets, you will find an Extends line item with the extended datasets listed from top to bottom. For example, here is the detail information for Gamma, showing that it extends Alpha and Beta.
You can also find this information on the viewing page for a dataset. Click More Info to see what datasets the dataset that you are viewing extends.
Use a naming convention for extended datasets
When you are working with a dataset, it is difficult to know what datasets are extended from it. For example, a person working with the Alpha dataset has no way of knowing that it is extended by the Beta and Gamma datasets.
You can manage this by using a naming convention to indicate when a dataset is extended from another. For example, if you extend a dataset from dataset Alpha, you can name it Alpha.Beta. Later, if you extend two datasets from Alpha.Beta, you can name those datasets Alpha.Beta.Gamma and Alpha.Beta.Epsilon. This naming methodology is similar to that of data model datasets, where the dataset name indicates where it lives in a greater hierarchy of data model datasets.
When you extend a dataset you can update its description to indicate that it has been extended. Identify the knowledge objects that have been directly extended from it, not the full extension chain, if one exists. Add a sentence like this to the dataset description: "This dataset has been extended as a table dataset named <dataset_name> and a report named <report_name>."
The from command
Dataset extension is facilitated by the
from command, whether you extend it by opening it in the Search view, or through the Table Editor.
When you open a dataset in the Search view, you see a search string that uses the
from command to retrieve data from that dataset. For example, say you have a dataset named Buttercup_Games_Purchases. If, while on the Datasets listing page, you click Explore in Search for that dataset, the Splunk platform takes you to the Search view, where you see this search string:
| from datamodel:"Buttercup_Games_Purchases"
If you work with Splunk Cloud, or work with Splunk Enterprise and have installed the Splunk Datasets Add-on, you can extend any dataset as a table dataset. When you do this, the Table Editor uses the
from command in the background. Click the SPL toggle in the command history sidebar to see how the Table Editor uses the
This closeup of the command history sidebar in the Table Editor shows that the initial data for the table dataset is provided by a
from command extension of the Buttercup Games Purchases dataset.
For more information, see
from in the Search Reference.
Extension and table acceleration
If you want to accelerate a table that extends other tables, it needs to be shared, and the tables it extends must be shared as well.
You will not see acceleration benefits when you use
from to extend an accelerated table.
You cannot accelerate a table that is extended from a lookup table file or lookup definition. Acceleration can only be applied to datasets that use purely streaming commands. Lookup dataset extension is not a streaming operation.
See Accelerate tables.
Use the Table Editor
This documentation applies to the following versions of Splunk® Enterprise: 6.6.0, 6.6.1, 6.6.2