About data models
Data models drive the Pivot tool. They enable users of Pivot to create compelling reports and dashboards without designing the searches that generate them. Data models can have other uses, especially for Splunk app developers.
Splunk knowledge managers design and maintain data models. These knowledge managers understand the format and semantics of their indexed data and are familiar with the Splunk search language. In building a typical data model, knowledge managers use knowledge object types such as lookups, transactions, search-time field extractions, and calculated fields.
What is a data model?
A data model is a hierarchically structured search-time mapping of semantic knowledge about one or more datasets. It encodes the domain knowledge necessary to build a variety of specialized searches of those datasets. These specialized searches are used by Splunk software to generate reports for Pivot users.
When a Pivot user designs a pivot report, they select the data model that represents the category of event data that they want to work with, such as Web Intelligence or Email Logs. Then they select an object within that data model that represents the specific dataset on which they want to report. Data models are composed of objects, which can be arranged in hierarchical structures of parent and child objects. Each child object represents a subset of the dataset covered by its parent object.
If you are familiar with relational database design, think of data models as analogs to database schemas. When you plug them into the Pivot Editor, they let you generate statistical tables, charts, and visualizations based on column and row configurations that you select.
To create an effective data model, you must understand your data sources and your data semantics. This information can affect your data model architecture--the manner in which the objects that make up the data model are organized.
For example, if your dataset is based on the contents of a table-based data format, such as a .csv file, the resulting data model is flat, with a single top-level root object that encapsulates the fields represented by the columns of the table. The root object may have child objects beneath it. But these child objects do not contain additional fields beyond the set of attributes that the child objects inherit from the root object.
Meanwhile, a data model derived from a heterogeneous system log might have several root objects (events, searches, and transactions). You can associate each of these root with a hierarchy of objects in nested parent and child relationships. Each of those child objects can have new fields in addition to the fields they inherit from ancestor objects.
Data models can get their fields from extractions that you set up in the Field Extractions section of Manager or by editing
transforms.conf. When you define your data model, you can arrange to have it get additional fields at search time through regular-expression-based field extractions, lookups, and
In data model terminology, the fields that data models use are called attributes. They break down into the categories described above (auto-extracted, eval expression, regular expression) and more (lookup, geo IP). See "Object attributes", below.
Note: Data models are a category of knowledge object and are fully permissionable. A data model's permissions cover all of its data model objects. For information about setting data model permissions, see "Manage data models," in this manual.
Data models generate searches
When you consider what data models are and how they work it can also be helpful to think of them as a collection of structured information that generates different kinds of searches. Each object within a data model can be used to generate a search that returns a particular dataset. When we say that a data model object "represents a dataset" we're really talking about the dataset returned by the object you select.
We go into more detail about this relationship between data models, data model objects, and searches in the following subsections.
- Object constraints determine the first part of the search through:
- Simple search filters (Root event objects and all child objects).
transactiondefinitions (Root transaction objects).
- More complex search strings that may use transforming commands, among others (Root search objects).
- Object attributes are essentially fields. When you select an object for Pivot, the unhidden attributes you define for that object comprise the list of fields that you'll choose from in Pivot when you decide what you want to report on. The fields you select are added to the search that the object generates, and they can include calculated fields, user-defined field extractions, and fields added to your data by lookups.
The last parts of the object-generated-search are determined by your Pivot Editor selections. They can determine the nature of the transforming search commands used to format the results as a statistical table that it can use in turn as the basis for a chart visualization.
For more information about how you use the Pivot Editor to create pivot tables, charts, and visualizations that are based on data model objects, see "Introduction to Pivot" in the Pivot Manual.
Data models are composed of one or more objects. Here are some basic facts about data model objects:
- An object is a specification for a dataset. Each data model object corresponds in some manner to a set of data in an index. You can apply data models to different indexes and get different datasets.
- Objects break down into four types. These types are: Event objects, search objects, transaction objects, and child objects.
- Objects are hierarchical. Objects in data models can be arranged hierarchically in parent/child relationships. The top-level event, search, and transaction objects in data models are collectively referred to as "root objects."
- Child objects have inheritance. Data model objects are defined by characteristics that mostly break down into constraints and attributes. Child objects inherit constraints and attributes from their parent objects and have additional constraints and attributes of their own.
- Child objects provide a way of filtering events from parent objects - Because a child object always provides an additional constraint on top of the constraints it has inherited from its parent object, the dataset it represents is always a subset of the dataset that its parent represents.
We'll dive into more detail about these and other aspects of data model objects in the following subsections.
Root objects and object types
The top-level objects in data models are referred to as "root objects." Data models can contain multiple root objects of various types, and each of these root objects can be a parent to more child objects. This association of base and child objects is an "object tree." The overall set of data represented by an object tree is selected first by its root object and then refined and extended by its child objects.
Root objects can be defined by a search constraint, a search, or a transaction:
- Root event objects are the most commonly-used type of root data model object. Each root event object broadly represents a type of event. For example, an HTTP Access root event object could correspond to access log events, while an Error event corresponds to events with error messages.
- Root event objects are typically defined by a simple constraint (see "Object Constraints," below)--it's what an experienced Splunk user might think of as the first portion of a search, before the pipe character, commands, and arguments are applied. For example,
status > 600and
sourcetype=access_* OR sourcetype=iis*are possible event object definitions.
- Note: Child objects of all three types--event, transaction, and search--are defined with simple constraints that narrow down the set of data that they inherit from their ancestor objects.
- Root transaction objects enable you to create data models that represent transactions: groups of related events that span time. Transaction object definitions utilize fields that have already been added to the model via event or search object, which means that you can't create data models that are composed only of transaction objects and their child objects. Before you create a transaction object you must already have some event or search object trees in your model.
- Root search objects use an arbitrary Splunk search that includes transforming commands to define the dataset that they represent. If you want to define a root dataset that includes one or more fields that aggregate over the entire dataset, you might need to use a root search object. For example: a system security dataset that has various system intrusion events broken out by category over time.
Object types and data model acceleration:
You can optionally use data model acceleration to speed up generation of pivot tables and charts. However, there are a few restrictions to this functionality related to data model objects that may have some bearing on how you construct your data model if you think your users would benefit from data model acceleration.
In a data model, only root event objects and their children can be accelerated. Root search objects, root transaction objects, and the children of those objects cannot be accelerated. You may want to avoid using a root search object if you want to accelerate it and you can set up a root event object that covers the same dataset.
For more information on enabling acceleration for your data models see "Manage data models," in this manual.
The following example shows the first several objects in a "Call Detail Records" data model. Four top-level root objects are displayed: All Calls, All Switch Records, Conversations, and Outgoing Calls.
All Calls and All Switch Records are root event objects that represent all of the calling records and all of the carrier switch records, respectively. Both of these root event objects have child objects that deal with subsets of the data owned by their parents. The All Calls root event object has child objects that break down into different call classifications: Voice, SMS, Data, and Roaming. If you were a Pivot user who only wanted to report on aspects of cellphone data usage, you'd select the Data object. But if you wanted to create reports that compare the four call types, you'd choose the All Calls root event object instead.
Conversations and Outgoing Calls are root transaction objects. They both represent transactions--groupings of related events that span a range of time. The "Conversations" object only contains call records of conversations between two or more people where the maximum pause between conversation call record events is less than two hours and the total length of the conversation is less than one day.
For details about defining the different types of objects, see the topic "Design data model objects," in this manual.
All data model objects are defined by sets of constraints. Object constraints filter out events that aren't relevant to the object; they help to define the dataset that the object represents.
- For a root event object or a child object of any type, the constraint looks like a simple search, without additional pipes and search commands. For example, the constraint for
HTTP Request, one of the root event objects of the Web Intelligence data model, is
- For a root search object, the constraint is the object's base search string.
- For a root transaction object, the constraint is the transaction definition. Transaction object definitions must identify Group Objects (either one or more event objects, a search object, or a transaction object) and one or more Group By fields. They can also optionally include Max Pause and Max Span values.
Constraints are inherited by child objects. Constraint inheritance ensures that each child object represents a subset of the data represented by its parent objects. Your Pivot users can then use these child objects to design reports with datasets that already have extraneous data prefiltered out.
Say you have a data model called Buttercup Games. Its Successful Purchases object is a child of the root event object
HTTP Requests and is designed to contain only those events that represent successful customer purchase actions. Successful Purchases inherits constraints from HTTP Requests and another parent object named Purchases.
1. HTTP Requests starts by setting up a search that only finds webserver access events.
2. The Purchases object further narrows the focus down to webserver access events that involve purchase actions.
3. And finally, Successful Purchases adds a constraint that reduces the object event set to web access events that represent successful purchase events.
When all the constraints are added together, the base search for the Successful Purchases object looks like this:
sourcetype=access_* action=purchase status=200
A Pivot user might use this object for reporting if they know that they only want to report on successful purchase actions.
For details about objects and object constraints, see the topic "Design data model objects," in this manual.
An object's attributes are essentially a set of fields associated with the dataset that the object represents. There are five types of object attributes:
- Auto-extracted: A field derived at search time. You can only add auto-extracted attributes to root objects. Child objects can only inherit them, and they cannot add new auto-extracted attributes of their own. Auto-extracted attributes can be:
- Fields that are extracted automatically, like
version. This includes fields indexed through structured data inputs, such as fields extracted from the headers of indexed CSV files.
- Field extractions, lookups, or calculated fields that you have defined in Settings or configured in
- Fields that you have manually added to the attribute because they aren't currently in the object dataset, but should be in the future. Can include fields that are added to the object dataset by generating commands such as
- Fields that are extracted automatically, like
- Eval Expression: A field derived from an
evalexpression that you enter in the attribute definition. Eval expressions often involve one or more extracted fields.
- Lookup: A field that is added to the events in the object dataset with the help of a lookup that you configure in the attribute definition. Lookups add fields from external data sources such as CSV files and scripts. When you define a lookup attribute you can use any lookup that you have defined in Settings and associate it with any other attribute that has already been associated with that same object.
- Regular Expression: This attribute type represents a field that is extracted from the object event data using a regular expression that you provide in the attribute definition. A regular expression attribute definition can use a regular expression that extracts multiple fields; each field will appear in the object attribute list as a separate regular expression attribute.
- Geo IP: A specific type of lookup that adds geographical attributes, such as latitude, longitude, country, and city to events in the object dataset that have valid IP address fields. Useful for map-related visualizations.
For more information about defining each of the five attribute types, see the topic "Design data model objects," in this manual.
The Data Model Editor groups attributes into three categories:
- Inherited - All objects have at least a few inherited attributes. Child attributes inherit attributes from their parent object, and these inherited attributes always appear in the Inherited category. Root event, search, and transaction objects also have default attributes that are categorized as inherited.
- Extracted - Any auto-extracted attribute that you add to an object will be listed in the "Extracted" attribute category.
- Calculated - Calculated attributes are attributes that are derived through a calculation or lookup of some sort. When you add Eval Expression, Regular Expression, Lookup, and Geo IP attribute types to an object, they all appear in this attribute category.
Note: The Data Model Editor lets you rearrange the order of calculated attributes. This is useful when you have a set of attributes that must be processed in a specific order, because attributes are processed in order from the top of the list to the bottom. See "Attributes serve several purposes," below, for more information.
Attributes are inherited
All objects have inherited attributes.
A child object will automatically have all of the attributes that belong to its parent. All of these inherited attributes will appear in the child object's "Inherited" category, even if the attributes were categorized otherwise in the parent object.
You can add additional attributes to a child object. The Data Model Editor will categorize these objects either as extracted attributes or calculated attributes depending on their attribute type.
You can design a relatively simple data model where all of the necessary attributes for an object tree are defined in its root object, meaning that all of the child objects in the tree have the exact same set of attributes as that root object. In such a data model, the child objects would be differentiated from the root object and from each other only by their constraints.
Root event, search, and transaction objects also have inherited attributes. These inherited attributes are default fields that are extracted from from every event, such as
You cannot delete inherited attributes, and you cannot edit their definitions. The only way to edit or remove an inherited attribute belonging to a child object is to delete or edit the attribute from the parent object it originates from as an extracted or calculated attribute. If the attribute originates in a root object as an inherited attribute, you won't be able to delete it or edit it.
You can hide attributes from Pivot users as an alternative to attribute deletion. See "Attributes can be visible or hidden to Pivot users," below.
You can also determine whether inherited attributes are optional for an object dataset or required. See "Attributes can be required or optional for an object dataset," below.
Attributes serve several purposes
Their most obvious function is to provide the set of fields that Pivot users use to define and generate a pivot report. The set of fields that a Pivot user has access to is determined by the object the user chooses when they enter the Pivot Editor. You might add attributes to a child object to provide fields to Pivot users that are specific to the dataset covered by that object.
On the other hand, you can also design calculated attributes whose only function is to set up the definition of other attributes or constraints. This is why attribute listing order matters: Attributes are processed in the order that they are listed in the Data Model Editor. This is why The Data Model Editor allows you to rearrange the listing order of calculated attributes.
For example, you could design a chained set of three Eval Expression attributes. The first two Eval Expression attributes would create what are essentially calculated fields. The third Eval Expression attribute would use those two calculated fields in its eval expression.
When you define an attribute you can determine whether it is visible or hidden for Pivot users. This can come in handy if each object in your data model has lots of attributes but only a few attributes per object are actually useful for Pivot users.
Note: An attribute can be visible in some objects and hidden in others. Hiding an attribute in a parent object does not cause it to be hidden in the child objects that descend from it.
Attributes are visible by default. Attributes that have been hidden for an object are marked as such in the object's attribute list.
The determination of what attributes to include in your model and which attributes to expose for a particular object is something you do to make your objects easier to use in Pivot. It's often helpful to your Pivot users if each object exposes only the data that is relevant to that object, to make it easier to build meaningful reports. This means, for example, that you can add attributes to a root object that are hidden throughout the model except for a specific object elsewhere in the hierarchy, where their visibility makes sense in the context of that object and its particular dataset.
Consider the example mentioned in the previous subsection, where you have a set of three "chained" Eval Expression attributes. You may want to hide the first two Eval Expression attributes because they're just there as "inputs" to the third attribute. You'd leave the third attribute visible because it's the final "output"--the attribute that matters for Pivot purposes.
Attributes can be required or optional for an object dataset
During the attribute design process you can also determine whether an attribute is required or optional. This can act as a filter for the event set represented by the object. If you say an attribute is required, you're saying that every event represented by the object must have that attribute. If you define an attribute as optional, the object may have events that do not have that attribute at all.
Note: As with attribute visibility (see above) an attribute can be required in some objects and optional in others. Marking an attribute as required in a parent object will not automatically make that attribute required in the child objects that descend from that parent object.
Attributes are optional by default. Attributes that have had their status changed to required for an object are marked as such in the object's attribute list.
Search macro examples
Manage data models
This documentation applies to the following versions of Splunk® Enterprise: 6.3.0, 6.3.1, 6.3.2, 6.3.3, 6.3.4, 6.3.5, 6.3.6, 6.3.7, 6.3.8, 6.3.9, 6.3.10, 6.3.11, 6.3.12, 6.3.13, 6.3.14, 6.4.0, 6.4.1, 6.4.2, 6.4.3, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11