About data models
The topics in this chapter show you how to design and build data models using the Data Model Editor. Data models drive the Pivot tool. They enable users of Pivot to realize compelling reports and dashboards without first going through the sometimes complex step of designing the searches that generate them. Data models can have other uses as well, especially for Splunk Enterprise app developers.
Data models are designed by Splunk Enterprise knowledge managers: people who understand the format and semantics of their indexed data, and who are familiar with the Splunk Enterprise search language. In the course of building a typical data model, knowledge managers will make use of several knowledge object types previously discussed in this manual, including lookups, transactions, search-time field extractions, and calculated fields.
The other topics in this chapter include:
- Manage data models - Learn how to create data models, set up their permissions, enable data model acceleration, clone existing data models and more.
- Design data models and objects - Learn how to use the Data Model Editor to define objects and their attributes and set up object hierarchies for your data models.
What is a data model?
When a Pivot user sets out to design a report, she first selects the data model that represents the broad category of event data that she wants to work with, such as "Web Intelligence" or "Email Logs." Then she selects an "object" within that data model that represents the specific dataset that she would like to report on.
Briefly put, a data model is a hierarchically-structured search-time mapping of semantic knowledge about one or more datasets. It encodes the domain knowledge necessary to build a variety of specialized searches of those datasets. These specialized searches are in turn used by Splunk Enterprise to generate reports for Pivot users.
If you're familiar with relational database design it might help to think of data models as analogs to database schemas. When you plug them into the Pivot Editor, they enable you to generate statistical tables, charts, and visualizations on the fly based on column and row configurations that you select.
To create an effective data model, you must understand your data sources (whether it's derived from a log file, TCP/UDP network input, received from a scripted input for an API, and so on) and your data semantics (how the various fields in your data are extracted, related, and organized). This information can affect your data model architecture.
For example, if your dataset is based on the contents of a table-based data format, such as a .csv file, the resulting data model will likely be fairly flat, with a single top-level "root" object that encapsulates all of the fields represented by the columns of the table. The root object may have child objects beneath it, but they won't contain any additional fields beyond those that they inherit from the root object (though they will have constraints that narrow down the selection of events that they represent).
Meanwhile, a data model derived from a heterogeneous system log can potentially end up having several root objects of varying types (events, searches, and transactions). Each of these root objects can be associated with a complex hierarchy of objects in nested parent and child relationships, and each of those child objects may have new fields aside from the fields that they inherited from their ancestor objects.
Data models can get their fields from extractions that you have previously set up via Manager or direct edits to
transforms.conf. But when you define your data model, you can also arrange to have it get additional fields at search time through regular-expression-based field extractions, lookups, and
In data model terminology, the fields that data models use are called attributes. They break down into the categories described above (auto-extracted, eval expression, regular expression) and more (lookup, geo IP). For more information, see the subsection below titled "Object attributes"
Note: Data models are a category of knowledge object and as such are fully permissionable. A data model's permissions cover all of its data model objects. For more information about setting data model permissions, see the topic "Manage data models," in this manual.
Data models generate searches
When you consider what data models are and how they work it can also be helpful to think of them as a collection of structured information that generates different kinds of searches. Each object within a data model can be used to generate a search that returns a particular dataset. When we say that a data model object "represents a dataset" we're really talking about the dataset returned by the object you select.
We go into more detail about this relationship between data models, data model objects, and searches in the following subsections.
- Object constraints determine the first part of the search through:
- Simple search filters (Root event objects and all child objects)
transactiondefinitions (Root transaction objects).
- More complex search strings that may use transforming commands, among others (Root search objects).
- Object attributes are essentially fields. When you select an object for Pivot, the unhidden attributes you define for that object comprise the list of fields that you'll choose from in Pivot when you decide what you want to report on. The fields you select are added to the search that the object generates, and they can include calculated fields, user-defined field extractions, and fields added to your data by lookups.
The last parts of the object-generated-search are determined by your Pivot Editor selections. They can determine the nature of the transforming search commands that Splunk Enterprise will use to format the results as a statistical table that it can use in turn as the basis for a chart visualization.
For more information about how you use the Pivot Editor to create pivot tables, charts, and visualizations that are based on data model objects, see "Introduction to Pivot" in the Pivot Manual.
Data models are composed of one or more objects. Here are some basic facts about data model objects:
- An object is a specification for a dataset. Each data model object corresponds in some manner to a set of data in an index. You can apply data models to different indexes and get different datasets.
- Objects break down into four types. These types are: Event objects, search objects, transaction objects, and child objects.
- Objects are hierarchical. Objects in data models can be arranged hierarchically in parent/child relationships. The top-level event, search, and transaction objects in data models are collectively referred to as "root objects."
- Child objects have inheritance. Data model objects are defined by characteristics that mostly break down into constraints and attributes. Child objects inherit constraints and attributes from their parent objects and have additional constraints and attributes of their own.
We'll dive into more detail about these and other aspects of data model objects in the following subsections.
- Child objects provide a way of filtering events from parent objects - Because a child object always provides an additional constraint on top of the constraints it has inherited from its parent object, the dataset it represents is always a subset of the dataset that its parent represents.
Root objects and object types
The top-level objects in data models are referred to as "root objects." Data models can contain multiple root objects of various types, and each of these root objects can be a parent to more child objects. This association of base and child objects is an "object tree." The overall set of data represented by an object tree is selected first by its root object and then refined and extended by its child objects.
Root objects can be defined by a search constraint, a search, or a transaction:
- Root event objects are the most commonly-used type of root data model object. Each root event object broadly represents a type of event. For example, an HTTP Access root event object could correspond to access log events, while an Error event corresponds to events with error messages.
- Root event objects are typically defined by a simple constraint (see "Object Constraints," below)--it's what an experienced Splunk Enterprise user might think of as the first portion of a search, before the pipe character, commands, and arguments are applied. For example,
status > 600and
sourcetype=access_* OR sourcetype=iis*are possible event object definitions.
- Note: Child objects of all three types--event, transaction, and search--are defined with simple constraints that narrow down the set of data that they inherit from their ancestor objects.
- Root transaction objects enable you to create data models that represent transactions: groups of related events that span time. Transaction object definitions utilize fields that have already been added to the model via event or search object, which means that you can't create data models that are composed only of transaction objects and their child objects. Before you create a transaction object you must already have some event or search object trees in your model.
- Root search objects use an arbitrary Splunk search that includes transforming commands to define the dataset that they represent. If you want to define a base dataset that includes one or more fields that aggregate over the entire dataset, you might need to use a root search object. For example: a system security dataset that has various system intrusion events broken out by category over time.
Object types and data model acceleration: You can optionally use data model acceleration to speed up generation of pivot tables and charts. However, there are a few restrictions to this functionality related to data model objects that may have some bearing on how you construct you data model, if you think your users would benefit from data model acceleration.
- In a data model, only the first root event object and its children can be accelerated. This means that in a data model with two base event object hierarchies, the second base event object hierarchy won't be accelerated. You may want to consider merging the two hierarchies or splitting them into separate models.
- Base search objects, base transaction objects, and the children of those objects cannot be accelerated. You may want to avoid using base search objects when base event objects will do the job, if data model acceleration is your goal.
For more information on enabling acceleration for your data models see "Manage data models," in this manual.
The following example shows the first several objects in a "Call Detail Records" data model. Four top-level root objects are displayed: All Calls, All Switch Records, Conversations, and Outgoing Calls.
All Calls and All Switch Records are root event objects that represent all of the calling records and all of the carrier switch records, respectively. Both of these root event objects have child objects that deal with subsets of the data owned by their parents. The All Calls root event object has child objects that break down into different call classifications: Voice, SMS, Data, and Roaming. If you were a Pivot user who only wanted to report on aspects of cellphone data usage, you'd select the Data object. But if you wanted to create reports that compare the four call types, you'd choose the All Calls root event object instead.
Conversations and Outgoing Calls are root transaction objects. They both represent transactions--groupings of related events that span a range of time. The "Conversations" object only contains call records of conversations between two or more people where the maximum pause between conversation call record events is less than two hours and the total length of the conversation is less than one day.
For details about defining the different types of objects, see the topic "Design data model objects," in this manual.
All data model objects are defined by sets of constraints. Object constraints filter out events that aren't relevant to the object; they help to define the dataset that the object represents.
- For a root event object or a child object of any type, the constraint looks like a simple search, without additional pipes and search commands. For example, the constraint for
HTTP Request, one of the root event objects of the Web Intelligence data model, is
sourcetype=access_* OR sourcetype=iis*.
- For a root search object, the constraint is the object's base search string.
- For a root transaction object, the constraint is the transaction definition. Transaction object definitions must identify Group Objects (either one or more event objects, a search object, or a transaction object) and one or more Group By fields. They can also optionally include Max Pause and Max Span values.
Constraints are inherited by child objects. Constraint inheritance ensures that each child object represents a subset of the data represented by its parent objects. Your Pivot users can then use these child objects to design reports with datasets that already have extraneous data prefiltered out.
For example, the Web Intelligence data model's
HTTP Success object is a child of the root event object
HTTP Request. It inherits
sourcetype=access_* OR sourcetype=iis* from
HTTP Request and then adds the additional constraint of
status = 2*, which narrows the set of events represented by the object down to HTTP request events that result in success. A Pivot user might use this object for reporting if they already know that they only want to report on successful HTTP request events.
The above example shows the constraints for the
DocAccess object, which is two more levels down the Web Intelligence data model hierarchy from the
HTTP Success object discussed in the previous paragraph. It includes constraints that were inherited from its parent, grandparent and great grandparent objects (
HTTP Success, and
HTTP Request, respectively) and adds a new set of constraints. The end result is a base search that is continually narrowed down by each set of constraints:
HTTP Request starts by setting up a search that only finds webserver access events
sourcetype=access_* OR sourcetype=iis*
HTTP Success further narrows the focus down to successful webserver access events.
Asset Access includes a constraint that cuts out all events that involve website pageviews (which leaves only asset access events).
uri_path!=*.php OR uri_path!=*.html OR uri_path!=*.shtml OR uri_path!=*.rhtml OR uri_path!=*.asp
4. And finally,
Doc Access adds a constraint that reduces the set of asset access events returned by the search down to events that only involve access of documents (
uri_path=*.doc OR uri_path=*.pdf
When all the constraints are added together, the base search for the object
Doc Access looks something like this:
sourcetype=access_* OR sourcetype=iis* status=2* uri_path!=*.php OR uri_path!=*.html OR uri_path!=*.shtml OR uri_path!=*.rhtml OR uri_path!=*.asp uri_path=*.doc OR search uri_path=*.pdf
For details about objects and object constraints, see the topic "Design data model objects," in this manual.
An object's attributes are essentially a set of fields associated with the dataset that the object represents. Object attributes come in five flavors:
- Auto-extracted: This attribute type is a field that Splunk Enterprise extracts at search time. It can be a field that Splunk auto-extracts out of the box (such as a default field) or an extraction that you have defined in Manager or configured in
transforms.conf. Auto-extracted attributes can only be added to root objects. Child objects can only inherit them, they cannot add new auto-extracted attributes of their own.
- Eval Expression: A field derived from an
evalexpression that you enter in the attribute definition. Eval expressions often involve one or more extracted fields.
- Lookup: A field that is added to the events in the object dataset with the help of a lookup. Lookups add fields from external data sources such as CSV files and scripts. When you define a lookup attribute for an object you can use any lookup that you have defined in Settings and associate it with any other attribute that has already been associated with that same object.
- Regular Expression: This attribute type represents a field that is extracted from the object event data using a regular expression that you provide. A regular expression attribute definition can use a regular expression that extracts multiple fields; each field will appear in the object attribute list as a separate regular expression attribute.
- Geo IP: A specific type of lookup that adds geographical fields, such as latitude, longitude, country, and city to events in the object dataset that have valid ip address fields. Useful for map-related visualizations.
Object attributes are inherited. A child object will automatically have all of the attributes that belong to its parent. You can design a relatively simple data model where all of the necessary attributes for a specific object tree are defined in its root object, meaning that all of the child objects in the tree have the exact same set of attributes as their root object. In such a data model, the child objects would be differentiated from the root object and from each other only by their constraints.
Attributes serve several purposes. Their most obvious function is to provide the set of fields that Pivot users work with to define and generate a pivot report; the set of fields they have access to is determined by the object they choose when they enter the Pivot Editor. You might add attributes to a child object to provide fields to Pivot users that are specific to the dataset covered by that object.
On the other hand, you can also design attributes whose function is to set up the definition of other attributes or constraints. This is why attribute listing order matters: Splunk Enterprise processes each attribute for an object in the order that it is listed.
For example, you could set up a set of three Eval Expression attributes that are in effect a chained set of eval functions. The first two Eval Expression attributes would create what are essentially calculated fields. The third Eval Expression attribute would use those two calculated fields in an eval expression that defines a third attribute. Only that third, final attribute would be used by Pivot users as a field.
When you define an attribute you can determine whether it is visible or hidden for Pivot users. In this example it's likely that you would want to hide the first two Eval Expression attributes but leave the third attribute visible so Pivot users can access it.
The determination of what attributes to include in your model and which attributes to expose for a particular object is something you do to make your objects easier to use in Pivot. It's often more helpful to your Pivot users if each object exposes only the data that is relevant to that object, to make it easier to build meaningful reports. This means that you may add fields to a root object that are hidden throughout the model except for a specific object elsewhere in the hierarchy.
Note: During the attribute design process you can also determine whether an attribute is required or optional. This can act as a filter for the event set represented by the object. If you say an attribute is required, you're saying that every event represented by the object must have that attribute. If you define an attribute as optional, the object may have events that do not have that attribute at all.
For details about defining each of the four attribute types, see the topic "Design data model objects," in this manual.
Create aliases for fields
Manage data models
This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15