About data models and data model datasets

The topics in this chapter show you how to use the Data Model Builder to design and build data models for the tutorial data.

What is a data model?

A data model is a type of knowledge object that applies an information structure to raw data, making it easier to use. Each data model represents a category of event data. Data models are composed of data model datasets. More specifically, a data model is a hierarchical search-time mapping of knowledge about one or more datasets. A data model encodes the domain knowledge necessary to build a variety of specialized searches of those datasets. Briefly put, data models generate searches. These specialized searches are in turn used to generate reports for Pivot users.

To create an effective data model, you must understand your data sources. You need to understand whether your data sources are derived from a log file, TCP/UDP network input, received from a scripted input for an API, and so on. You also need to understand your data semantics - how the various fields in your data are extracted, related, and organized. This information can affect your data model architecture.

Data models can get their fields from extractions that are defined on the Splunk Web Settings > Fields > Field extractions page or, for Splunk Enterprise, by editing the props.conf and transforms.conf files. But when you define your data model, you can also arrange to have it get additional fields at search time by using regex-based field extractions, lookups, or eval expressions.

In this tutorial, your data sources are web access and secure log files. Most of the fields are automatically extracted. Other fields are added using lookup files and calculated with eval expressions.

About data model datasets

Data models are composed of one or more datasets. Each dataset corresponds in some manner to a set of data in your index. Datasets break down into four types:

Events datasets
Search datasets
Transaction datasets
Child datasets

Datasets in data models can be arranged in parent/child relationships. Each top-level or root dataset can have child datasets which inherit the constraints and fields of the parent and have additional constraints and fields of their own.

Note: Data model datasets are a category of knowledge object. However, data model datasets often use other knowledge objects such as extracted fields, calculated fields, and lookups to define the specific sets of data that the data model dataset represents.

Here is an example of a data model as viewed through the Data Model Builder.

In this example, the dataset hierarchy is in the left-hand sidebar. The Splunk Server root event dataset is selected. The Splunk Server dataset contains all of the data in the data model. The child datasets that branch off of the Splunk Server object, such as Scheduler, Acceleration, and Licenser, and each child dataset contains different subsets of that data.

On the right side of the Data Model Builder are the dataset constraints that define the dataset and the list of fields associated with the dataset. The other topics in this chapter show you how to create a data model. You will learn how to use the Data Model Builder to define the dataset hierarchies and dataset fields for the data model.

Dataset constraints

All data model datasets are defined by sets of constraints that filter out events that are not relevant to the dataset. These constraints help to define the data that the dataset represents. A typical constraint looks like the first part of a search, before pipes and additional search commands are added.

Constraints are inherited by child datasets to ensure that each child dataset represents a subset of the data from the parent datasets. Pivot users can then use these child datasets to design reports with datasets that already have extraneous data prefiltered out.

Dataset fields

Dataset fields come in five flavors: Auto-extracted, Eval expression, Lookup, Regular Expression, and Geo IP.

Dataset fields are inherited. A child dataset will automatically have all of the fields that belong to its parent. You can design a relatively simple data model where all of the necessary fields for a specific dataset tree are defined in its root dataset, and the child datasets would be differentiated from the root dataset and from each other only by their constraints.

Fields serve several purposes. They are the set of fields that Pivot users work with to define and generate a pivot report. The set of fields you have access to is determined by the dataset you choose when you enter Pivot. You might add fields to a child dataset to provide fields to Pivot users that are specific to the dataset covered by that dataset.

Learn more about data models

The information discussed in this topic is limited to what you need to know to build the data models for the tutorial data. For more information, see About data models and Design data models in the Knowledge Manager Manual.

Next steps

Proceed to the next topic, where you will create a new data model.

Related answers from Splunk Community

About data models and data model datasets

What is a data model?

About data model datasets

Dataset constraints

Dataset fields

Learn more about data models

Next steps

Comments

About data models and data model datasets

Was this topic useful?