Splunk® Enterprise

Knowledge Manager Manual

Download manual as PDF

This documentation does not apply to the most recent version of Splunk. Click here for the latest version.
Download topic as PDF

Accelerate data models

Data model acceleration is a tool that you can use to speed up data models that represent extremely large datasets. After acceleration, pivots based on accelerated data model objects complete quicker than they did before, as do reports and dashboard panels that are based on those pivots.

Data model acceleration performs this magic with the help of Splunk Enterprise's High Performance Analytics Store functionality, which builds data summaries behind the scenes in a manner similar to that of report acceleration. Like report acceleration, data model summaries are easy to enable and disable, and are stored on your indexers parallel to the index buckets that contain the events that are being summarized.

This topic covers:

  • The differences between data model acceleration, report acceleration, and summary indexing
  • A brief explanation of how data models are enabled.
  • How Splunk Enterprise builds data model acceleration summaries
  • How you can query accelerated data model summaries with the tstats command.

This topic also explains ad hoc data model acceleration. Splunk Enterprise applies ad hoc data model acceleration whenever you build a pivot with an unaccelerated object, even search- and transaction-based objects, which can't be accelerated in a persistent fashion. However, any acceleration benefits you obtain are lost the moment you leave the Pivot Editor or switch objects during a session with the Pivot Editor. These disadvantages do not apply to "persistently" accelerated objects, which will always load with acceleration whenever they're accessed via Pivot. In addition, unlike "persistent" data model acceleration, ad hoc acceleration is not applied to reports or dashboard panels built with Pivot.

Data model acceleration vs report acceleration and summary indexing

From a "big picture" perspective, the primary difference between report acceleration and data model acceleration is this:

  • Report acceleration and summary indexing speed up individual searches, on a report by report basis. They do this by building collections of precomputed search result aggregates.
  • Data model acceleration speeds up reporting for the specific set of attributes (fields) that you define in a data model.

In other words, data model acceleration creates summaries for the specific set of fields you and your Pivot users want to report on, accelerating the dataset represented by that collection of fields rather than a particular search.

What is a high-performance analytics store?

Data model acceleration summaries take the form of time-series index files, which have the .tsidx file extension. Each .tsidx summary contains records of the indexed field::value combos in the selected dataset and all of the index locations of those field::value combos. It's these .tsidx summaries that make up the high-performance analytics store. Collectively, these summaries are optimized to accelerate a range of analytical searches involving a specific set of fields--the set of fields defined as attributes in the accelerated data model.

An accelerated data model's high-performance analytics store spans a "summary range". This is a range of time that you select when you enable acceleration for the data model. When you run a pivot on an accelerated dataset, the pivot's time range must fall at least partly within this summary range in order to get an acceleration benefit. For example, if you have a data model that accelerates the last month of data but you create a pivot using one of this data model's objects that runs over the past year, the pivot will initially only get acceleration benefits for the portion of the search that runs over the past month.

The .tsidx files that make up a high-performance analytics store for a single data model are always distributed across one or more of your indexers. This is because Splunk Enterprise creates .tsidx files on the indexer, parallel to the buckets that contain the events referenced in the file and which cover the range of time that the summary spans.

Note: The high-performance analytics store created through persistent data model acceleration is different from the summaries created through ad hoc data model acceleration. Ad hoc summaries are always created in a dispatch directory at the search head. For more information about ad hoc data model acceleration, see the subtopic "About ad hoc data model acceleration," below.

Enable acceleration for a data model

You use the Edit Acceleration dialog to enable acceleration for a data model. There are three ways to get to this dialog:

  • Navigate to the Data Models management page, find the model you want to accelerate, and click Edit and select Edit Acceleration.
  • Navigate to the Data Models management page, expand the row of the data model you want to accelerate, and click Add for ACCELERATION.
  • Open the Data Model Editor for a data model, click Edit and select Edit Acceleration.

Once you open the Edit Acceleration dialog you can select Accelerate to to enable acceleration for the data model. Then you choose a Summary Range. This Summary Range can span 1 Day, 7 Days, 1 Month, 3 Months, 1 Year, or All Time, depending on the range of time over which you plan to run pivots against the accelerated objects in the data model. For example, if you only plan to run pivots over periods of time within the last seven days, choose 7 Days.

If you require a different summary range than the ones supplied by the Summary Range field, you can configure it for your data model in datamodels.conf.

Note: Smaller time ranges mean smaller .tsidx files that require less time to build and which take up less space on disc, so you may want to go with shorter ranges when you can.

Data model acceleration caveats

There are a number of restrictions on the kinds of data model objects that can be accelerated.

  • Data model acceleration only affects the first event object hierarchy in a data model. Additional event object hierarchies and object hierarchies based on root search and root transaction objects are not accelerated.
    • For example, you could have a data model that has a search object hierarchy at the top, then two separate event object hierarchies, and a transaction object hierarchy at the bottom. Only the first of the two event object hierarchies can benefit from full data model acceleration.
    • Pivots that use unaccelerated objects fall back to _raw data, which means that they will initially run slower. However, they can receive some acceleration benefit from ad hoc acceleration. See "About ad hoc data model acceleration" at the end of this topic for more information.
  • Data model acceleration is most efficient if the root event object being accelerated includes in its initial constraint search the index(es) that Splunk Enterprise should search over. A single high-performance analytics store can span across several indexes in multiple indexers. If you know that all of the data that you want to pivot on resides in a particular index or set of indexes, you can speed things up by telling Splunk Enterprise where to look. Otherwise Splunk Enterprise may end up wasting time unnecessarily accelerating data that is not of use to you.

For the full list of restrictions and caveats on data model usage see the list in "Managing Data Models," in this manual.

After you enable acceleration for a data model

After you enable acceleration for your data model, Splunk Enterprise begins building data model acceleration summaries that span the summary range that you've specified. It builds them in indexes with events that contain the fields specified in the data model. These .tsidx file summaries are stored parallel to their corresponding index buckets in a manner identical to that of report acceleration summaries.

Splunk Enterprise runs a search every 5 minutes to update existing data model summaries. It runs a maintenance process every 30 minutes to remove old, outdated summaries. You can adjust these intervals in datamodels.conf and limits.conf, respectively.

A few facts about data model summaries:

  • Each bucket in each index in a Splunk Enterprise instance can have one or more data model summaries, one for each accelerated data model for which it has relevant data. These summaries are created by Splunk Enterprise as it collects data.
  • Splunk Enterprise then scopes summaries by search head (or search head pool id) to account for different extractions that may produce different results for the same search string.
  • There is a directory for each data model that is accelerated along with the app it comes from. Data model accelerations are scoped either to an app or global; data models that are private cannot be accelerated. This prevents individual users from taking up disk space with private data model acceleration summaries.

Note: If necessary you can configure the location of data model summaries via indexes.conf.

About the summary range

For the most part, accelerated data model summary ranges behave in a manner similar to accelerated report summary ranges.

Data model summary ranges span an approximate range of time. If you select a Summary Range of 7 days, Splunk Enterprise will build summaries for the data model that each approximately span the past 7 days. Once Splunk Enterprise finishes building the summary, going forward the data model summarization process ensures that the summary always covers the selected range, removing older summary data that passes out of the range.

Note: We say that the Summary Range indicates the approximate range of time that a summary spans. At times, summaries will have a store of data that that slightly exceeds their summary range, but they never fail to meet it, unless it's the first time the summary is being built.

In the future, when you run a pivot using an accelerated object from the data model over a range that falls within the preceding week, Splunk Enterprise will run the pivot against the data model's summary rather than the source index _raw data again. In most cases the summary will have far less data than the source index, and that means that the pivot should complete faster than it did on its initial run.

If you run a pivot over a period of time that is only partially covered by the summary range, the pivot won't complete quite as fast. This is because Splunk Enterprise has to run at least part of the pivot search over raw data in the main Splunk Enterprise index. For example, if the Summary Range setting for a data model is 1 week and you run a pivot using an accelerated object from that data model over the last 9 days, the pivot only gets acceleration benefits for the portion of the report that covers the past 7 days. The portion of the report that runs over days 8 and 9 will run at normal speed. In cases like this, Splunk Enterprise will return the accelerated results from summaries first, and then fill in the gaps at a slower speed.

Keep this in mind when you set the Summary Range value. If you always plan to run a report over time ranges that exceed the past 7 days, but don't extend further out than 30 days, you should select a Summary Range of 1 month when you set up report acceleration for that report.

How Splunk Enterprise builds data model acceleration summaries

As we mentioned earlier, when you enable acceleration for a data model, Spunk Enterprise builds the initial set of .tsidx file summaries for the data model and then runs scheduled searches in the background every 5 minutes to keep those summaries up to date. Each update ensures that the entire configured time range is covered without a significant gap in data. This method of summary building also ensures that late-arriving data will be summarized without complication.

To verify that Splunk Enterprise is scheduling searches to update your data models, in log.cfg you can set category.SavedSplunker=DEBUG and then watch scheduler.log for events like:

04-24-2013 11:12:02.357 -0700 DEBUG SavedSplunker - Added 1 scheduled searches for accelerated datamodels to the end of ready-to-run list

The speed of summary creation depends on the amount of events involved and the size of the summary range. You can track progress towards summary completion on the Data Models management page. Find the accelerated data model that you want to inspect, expand its row, and review the information that appears under ACCELERATION.

6.0 dm acceleration metrics.png

Status tells you whether the acceleration summary for the data model is complete. If it is in Building status it will tell you what percentage of the summary is complete. Keep in mind that data model summaries are constantly updating with new data; just because a summary is "complete" now doesn't mean it won't be "building" later.

Note: The data model acceleration status is updated occasionally and and cached so it may not represent the most up-to-date status.

You can map the location of data model summaries to the following directory:

 $SPLUNK_DB/<index>/datamodel_summary/<bucket_id>/<search_head_or_pool_id>/DM_<datamodel_app>_<datamodel_name>

Data model summary size on disk

You can use the data model metrics on the Data Models management page to track the total size of a data model's summary on disk. Summaries do take up space, and sometimes a signficant amount of it, so it's important that you avoid overuse of data model acceleration. For example, you may want to reserve data model acceleration for data models whose pivots are heavily used in dashboard panels.

The amount of space that a data model takes up is related to the number of events that you are collecting for the summary range you've chosen. It can also be negatively affected if the data model includes attributes with high cardinality (that have a large set of unique values), such as a Name attribute.

If you are particularly size constrained you may want to test the amount of space a data model acceleration summary will take up by enabling acceleration for a small Summary Range first, and then moving to a larger range if you think you can afford it.

Query data model acceleration summaries

You can query the high-performance analytics store for a specific accelerated data model in Search with the tstats command.

tstats can sort through the full set of .tsidx file summaries that belong to your accelerated data model even when they are distributed among multiple indexes.

This can be a way to quickly run a stats-based search against a particular data model just to see if it's capturing the data you expect for the summary range you've selected.

To do this, you identify the data model using FROM datamodel=<datamodel-name>:

| tstats avg(foo) FROM datamodel=buttercup_games WHERE bar=value2 baz>5

The above query returns the average of the field foo in the "Buttercup Games" data model acceleration summaries, specifically where bar is value2 and the value of baz is greater than 5.

Note: You don't have to specify the app of the data model as Splunk Enterprise takes this from the search context (the app you are in). However you cannot query an accelerated data model in App B from App A unless the data model in App B is shared globally.

For more about the tstats command, including the usage of tstats to query normal indexed data, see the entry for tstats in the Search Reference.

About ad hoc data model acceleration

Even when you're building a pivot that is based on a data model object that is not accelerated in a persistent fashion, that pivot can benefit from what we call "ad hoc" data model acceleration. In these cases, Splunk Enterprise will build a summary over all time in a dispatch directory when you work with an object to build a pivot in the Pivot Editor.

Splunk Enterprise begins building the ad-hoc summary after you select an object and enter the pivot editor. You can follow the progress of the ad hoc summary construction with the progress bar:

6.0 pivot progressbar.png

Once the progress bar reads Complete the ad hoc summary is built (over an all-time time range), and Splunk Enterprise will use it to return pivot results faster going forward. But this summary only lasts while you work with the object in the Pivot Editor. If you leave the editor and return, or switch to another object and then return to the first one, the ad hoc summary will need to be rebuilt.

Ad hoc data model acceleration works for all object types, including root search objects and root transaction objects. Its main disadvantage against persistent data model acceleration is that with persistent data model acceleration, the summary is always there, keeping pivot performance speedy, until acceleration is disabled for the data model, while with ad hoc acceleration, you have to wait for the summary to be rebuilt each time you enter the Pivot Editor.

Here's a summary of the ways in which ad hoc data model acceleration differs from persistent data model acceleration:

  • Ad hoc data model acceleration takes place on the search head rather than the indexer. This enables it to accelerate all three object types (event, search, and transaction).
  • Ad hoc data model acceleration summaries are created in dispatch directories at the search head. Persistent data model acceleration summaries are created and stored in your indexes alongside index buckets.
  • Splunk Enterprise deletes ad hoc data model acceleration summaries when you leave the Pivot Editor or change the object you are working on while you are in the Pivot Editor. When you return to the Pivot Editor for the same object, Splunk Enterprise must rebuild its ad hoc summary from scratch. You cannot preserve ad hoc data model summaries for later use.
    • Pivot job IDs are retained in the pivot URL, so if you quickly use the back button after leaving Pivot (or return to the pivot job with a permalink) you may be able to use the ad-hoc summary for that job without waiting for a rebuild. Splunk Enterprise deletes ad hoc data model acceleration summaries from the dispatch directory a few minutes after you leave Pivot or switch to a different model within Pivot.
  • Ad hoc acceleration does not apply to reports or dashboard panels that are based on pivots. If you want pivot-based reports and dashboard panels to benefit from data model acceleration, base them on objects from a persistently accelerated event object hierarchy.
  • Ad hoc data model acceleration can create more load on the search head than persistent data model acceleration creates on your indexers. This is because each user of Pivot is creating their own pivot job and associated ad hoc data model acceleration summary, which in turn can create more load on the search head, since ad-hoc summaries are always run over all-time. Persistent data model acceleration summaries, on the other hand, are shared by all users of the data model, and they can be scoped to shorter windows of time, such as a week or a month.
PREVIOUS
Manage report acceleration
  NEXT
Use summary indexing for increased reporting efficiency

This documentation applies to the following versions of Splunk® Enterprise: 6.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.0.5, 6.0.6, 6.0.7, 6.0.8, 6.0.9, 6.0.10, 6.0.11, 6.0.12, 6.0.13, 6.0.14, 6.0.15


Was this documentation topic helpful?

Enter your email address, and someone from the documentation team will respond to you:

Please provide your comments here. Ask a question or make a suggestion.

You must be logged into splunk.com in order to post comments. Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic. If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk, consider posting a question to Splunkbase Answers.

0 out of 1000 Characters