Controlling data volumes

Controlling Data Volumes

In this topic we discuss different ways to control the type and quantity of data that you bring into Splunk for VMware. Collecting the correct type of data is important, but so also is limiting the quantity of data that you bring into Splunk. We are always looking for ways to streamline our data collection process and only collect the data that's necessary. Having control over your data is important as the amount of data you collect can directly affect your licensing requirements.

Data volume produced by the FA VM

The amount of data produced by the FA VM is primarily determined by the number of hosts in the solution, and how it is configured. If you turn on all of the data sources in the solution, each ESX/i host will produce 400-800 MB of data each day.

The numbers quoted are general estimates and they can vary even more depending on how you configure your deployment. For example, data volumes are generally not driven by the number of VMs in the environment, however, this assumes a typical environment where any given host runs 10-25 VMs. The volumes, however, may be very different and driven by the number of VMs.

We have seen that when in a steady state one vCenter server generates the same amount of data as an ESX/i host. This is approximately 400-800 MB for each vCenter server in your environment. This estimate does not include any historical vCenter log data captured initially by the solution. As always, different setups within environments can cause these numbers to change.

How to limit your data volume

As a systems administrator you can limit your data volume in a number of ways.

Remove duplicate fields in your data

You can remove duplicate fields that are currently in the data relating to the VMware metrics that we collect . VMware reported certain metrics as Min and Max fields when they are actually duplicates of the Average field . You can now limit your data volume by removing the duplicate fields in the data. This does not affect the built-in views but we recommend that you update your custom searches and custom views so that they no longer use the old fields (min and max) and that when you write new ones that you base them only on using the Average field. Min and Max values are still calculated off the raw data and included in summary indexes. This is not an issue for the views and dashboards that we provide to you as part of the release as we are backward compatible with previous versions.

Any customizations that you have made to the App (views, searches, dashboards) can be affected by this change and you may be collecting duplicate information using these "older" fields in your searches,

Use NullQueue to filter log data on the indexers

VMware API calls made by the FA VM to vCenter are responsible for writing increased amounts of log data to the vpxd logs. You can exclude VC log data generated by the FA VM API calls by routing it to nullQueue, Splunk's /dev/null equivalent. In this case you can nullQueue the data (drop the data) when our technology Add-on, TA-vmware (that sits on Indexers), receives it from the forwarder (the VC forwarder). Remember that vcenter system logs are captured on the local VC machine itself and are not collected using the FA VM.

When you filter out data in this way, the filtered data is not forwarded or added to the Splunk index, and doesn't count toward your indexing volume. The forwarder discards the data. There is also no change to the data - it is still being generated at the source and is written to the the logs on the local vCenter machine. The filter just prevents the data from being collected by the solution. This filtering is all done at the indexer, not at the forwarder, and the data is not considered against your license.

Exclude the VC log data

You can exclude this VC log data and reduce your data volume by editing the props.conf file for TA-vcenter. In TA-vcenter edit props.conf and uncomment the transforms-routing attributes that determine how to route the vpxd events. Uncomment the following lines of code:

For sourcetype = vmware:vclog:vpxd, uncomment:

#TRANSFORMS-null1 = vmware_vpxd_level_null
#TRANSFORMS-null4 = vmware_vpxd_retrieveContents_null
#TRANSFORMS-null5 = vmware_vpxd_null

For sourcetype = vmware:vclog:vpxd-alert, uncomment:

#TRANSFORMS-null2 = vmware_vpxd_level_null,vmware_vpxd_level_null2

For sourcetype = vmware:vclog:vpxd-profiler, uncomment:

#TRANSFORMS-null3 = vmware_vpxd_level_null,vmware_vpxd_level_null2

When uncommented, props.conf works with transforms.conf to route the specified source types to nullQueue. The actual routing is done in transforms.conf.

For more information on nullQueue, see Filter event data and send it to queues.

Use the setting perfInstanceDataPerfTypeBlacklist to filter your data

For Splunk events that have sourcetype=vmware:perf and instance=*, the events are filtered out by the perftype field as defined in the stanza. If perftype matches the perfInstanceDataPerfTypeBlacklist, then those instance events are not sent, however the aggregate events for that perftype are sent. The perfInstanceData setting has higher priority than perfInstanceDataPerfTypeBlacklist.

This setting is a data type control will only work in stanzas where the "action=PerfDiscovery" setting is specified and when you have set perfInstanceData=ON. It gives you more fine grained control over the type of instance data that you want the engine to collect.

For more information, See perfInstanceDataPerfTypeBlacklist in this manual.

Installation and Configuration Guide

Controlling data volumes

Controlling Data Volumes

Data volume produced by the FA VM

How to limit your data volume

Remove duplicate fields in your data

Use NullQueue to filter log data on the indexers

Exclude the VC log data

Use the setting perfInstanceDataPerfTypeBlacklist to filter your data

Comments

Controlling data volumes