Add, edit, and delete threshold settings

Manage threshold settings

The Configured Thresholds table lists the metrics that Splunk App for VMware collects that have default threshold values. All metrics listed on this page can be found in the VMware PerformanceManager or VMware VirtualMachineQuickStats unless it is a Splunk-defined metric.

Select Settings > Threshold Configuration to view this dashboard.

How to use thresholds

On this dashboard you can:

Add a new metric. Select new to add a row the existing metrics table. Set the field values with those of the new metric for which you want to collect data.
Enable or disable the metric, and select the entity type for the metric. Give it a name, a perf type and set warning and critical threshold values for it. Add a description. Click Save to save the metric to the server.
Delete a metric or group of metrics. Select the check box associated with the metric(s), then click Delete Selected.
Edit settings for a metric that exists. Click on the row corresponding to the metric, then toggle the Enable/Disable setting, edit the value fields for warning threshold and critical threshold values. Click Save.

When you enable or disable a metric data associated with that metric is collected by the app. You can:

Enable or disable a group of metrics: select the check box associated with each metric you want to enable, then click Enable Selected.
Enable or disable a single metric: Select the Enabled/Disabled toggle for the metric, then click Save.

Performance metrics reference table

The following performance metrics have default thresholds defined in Splunk App for VMware.

cpu

PerfType	Metric name	Entity	Threshold Value	Description
cpu	p_average_cpu_usage_percent	vm	Critical > 90% Warning > 75%	Virtual machine's average usage in percent.
		host	Critical > 90%, Warning > 75%	Average usage of the host's cpu in percent.
	p_summation_cpu_ready_millisecond	vm	Critical >2000 Warning >1000	Virtual machine's state waiting for cpu time measured in milliseconds.
		host	Critical > 2000 Warning >1000	Amount of time in milliseconds the host waited for cpu cycles.
	p_average_cpu_demand_megaHertz	vm	Critical < 0 Warning < 0	The amount of cpu resources that a virtual machine would use if there was no cpu limit and no contention for cpu.Less than 0 indicates that the VM does not demand any CPU.
		host	Critical < 0 Warning < 0	The aggregate amount of cpu resources that all virtual machines would use if there was no cpu limit and no contention for cpu or cpu limit. Less than 0 indicates that none of the VMs on the host demand any CPU.
	p_average_cpu_usagemhz_megaHertz	vm	Critical < 0 Warning < 0	The CPU usage, measure in megahertz. This is the amount of actively used vCPU. This is the hypervisor's view of the CPU usage, not the guest OSes version of the same metric. Less than 0 indicates the VM is using any CPU.
		host	Critical < 0 Warning < 0	This is the CPU usage measured in megahertz. This is the aggregate of CPU usage across all VMs on a host. Less than 0 indicates that none of the VMs on the host require CPU usage

mem

PerfType	metric name	Entity	Threshold Value	Description
mem	p_average_mem_usage_percent	vm	Critical >= 75% Warning >= 90%	Virtual machine's average usage in percent.
		host	Critical >= 75% Warning	Average usage of the host's cpu in percent.
	p_average_mem_active_kiloBytes	vm	Critical >95 Warning >75	A virtual machine's that is actively in use.
		host	Critical > 95 Warning >75	Average amount of all memory in active state by all virtual machines and the vpxd services.
	p_average_mem_consumed_kiloBytes	vm	Critical >95 Warning >75	Virtual machine's memory - memory saved by memory sharing.
		host	Critical > 95 Warning > 75	Average amount of memory being consumed by the host. This includes all virtual machines and the overhead of the vmkernal.
	p_average_mem_overhead_kiloBytes	vm	Critical > 95 Warning > 75	Memory used by vmware to actually power the virtual machine.
		host	Critical > 95 Warning> 75	The average overhead of all virtual machines and the overhead of the vSphere.
	p_average_mem_granted_kiloBytes	vm	Critical > 95 Warning > 75	Physical memory that is mapped to the virtual machine. Does not include overhead memory.
		host	Critical > 95 Warning > 75	Average memory granted to all virtual machines and vSphere.
	p_average_mem_vmmemctl_kiloBytes	vm	Critical > 10 Warning > 2	Amount of physical memory that is being reclaimed by the host through vmware's ballooning driver. Frequent ballooning is a sign of a host in stress.
		host	Critical > 10 Warning > 2	The sum of all vmmemctl values for all powered-on virtual machines. This value may be greater then the ballloon value of the host which is a sign of the kernel trying to have more virtual machines to release memory.
	p_average_mem_swapin_kiloBytes	vm	Critical > 10 Warning > 0	Memory that's being read by the virtual machine from the hosts swap file. Any amount of swapping is a sign of a host in stress.
		host	Critical > 10 Warning > 0	Combined sum of all the swap-in values for all powered-on virtual machines.
	p_average_mem_swapout_kiloBytes	vm	Critical >10 Warning > 0	The amount of memory the virtual machine has had to write to a swap file.
		host	Critical > 10 Warning > 0	Combined sum of all the swap-off values for all powered-on virtual machines.
	p_average_mem_llSwaped_kiloBytes	vm	Critical > 5000 Warning > 0	Amount of memory from a virtual machine that has been swapped by the host. This is a host swapping and is always a sign of the host being in stress. Any time this threshold is triggered, the host has no memory, and cannot reclaim it from the ballooning driver.
	p_average_mem_llSwapUsed_kiloBytes	host	Critical >= 5000 Warning >= 0	Amount of memory from all virtual machine that has been swapped by the host. This is a host swapping and is always a sign of the host being in stress. Any time this threshold is triggered, the host has no memory, and cannot reclaim it from the ballooning driver.

net

PerfType	Splunk metric name	Entity	Threshold Value	Description
net	p_average_net_received_kiloBytesPerSecond	vm	Critical > 95% Warning > 75%	Average kilobytes read across the virtual machine's virtual nic.
		host	Critical > 95% Warning > 75%	Average amount of data in kilobytes received across the host's physical adapter.
	p_average_net_transmitted_kiloBytesPerSecond	vm	Critical > 95% Warning > 75%	Average kilobytes broadcasted across the virtual machine's virtual nic.
		host	Critical > 95% Warning > 75%	Average amount of data in kilobytes broadcasted across the host's physical adapter.
	p_average_net_usage_kiloBytesPerSecond	vm	Critical > 95% Warning > 75%	Combined broadcast and received rates across all virtual NIC instances.
		host	Critical > 95% Warning > 75%	Combined broadcast and received rates across all physical NIC instances.

disk

PerfType	metric name	Entity	Threshold Value	Description
disk	p_average_disk_read_kiloBytesPerSecond	vm	Critical > 95% Warning > 75%	Average read rate in kilobytes per second to the virtual disks attached.
	p_average_disk_numberReadAveraged_number	host	Critical > 95% Warning > 75%	Average kilobytes read from each LUN on the host.
	p_average_disk_write_kiloBytesPerSecond	vm	Critical > 95% Warning > 75%	Average write rate in kilobytes per second to the virtual disks attached.
	p_average_disk_numberWriteAveraged_number	host	Critical > 95% Warning > 75%	Average kilobytes written to each LUN on the host.
	p_average_disk_usage_kiloBytesPerSecond	vm	Critical > 95% Warning > 75%	Average I/O rate to the virtual disk.
		host	Critical > 95% Warning > 75%	Average aggregated disk I/O for all virtual machines running on the host.
	p_summation_disk_numberWrite_number	vm	Critical > 95% Warning > 75%	Number of times the virtual machine wrote to it's virtual disk.
		host	Critical > 95% Warning > 75%	Total number of writes to the target LUN.
	p_summation_disk_numberRead_number	vm	Critical > 95% Warning > 75%	Number of times the virtual machine read from it's virtual disk.
		host	Critical > 95% Warning > 75%	Total number of reads from the target LUN.
	p_latest_disk_maxTotalLatency_millisecond	vm	Critical > 30% Warning > 15%	Time in milliseconds it took to process a SCSI command by the virtual machine.
		host	Critical > 30% Warning > 15%	The sum in milliseconds of the kernel requests to the device.
	p_average_disk_queueLatency_millisecond	vm	Critical > 5% Warning > 1%	Time in milliseconds that a virtual machines request spent in a queue state.
		host	Critical > 5% Warning > 1%	The sum in milliseconds a request spent in a queue state.
	p_summation_disk_commandsAborted_number	vm	Critical > 2% Warning > 0%	Number of commands that were aborted on the virtual machine.
		host	Critical > 2% Warning > 0%	Number of commands that were aborted on the host.
	p_summation_disk_busResets_number	vm	Critical > 2% Warning > 0%	Number of SCSI-bus reset commands that were issued.
		host	Critical > 2% Warning > 0%	Number of SCSI-bus reset commands that were issued.

inv

PerfType	metric name	Entity	Threshold Value	Description
inv	PercentHighCPUVm	vm	Critical > 75 Warning > 50	This is a Splunk metric. The threshold is implemented on top of VMInvCpuMaxUsg. Used on the home_proactive_monitoring dashboard to to give a warning / critical level of vms that are in a "critical" state. This allows you to color the gauges based on the % of vm's in critical state out of the total number of vms.
	PercentHighMemVm	vm	Critical > 75 Warning > 50	This is a Splunk metric.The threshold is implemented on top of VMInvMemMaxUsg. Used on the home_proactive_monitoring dashboard to give a warning / critical level of vms that are in a "critical" state. This allows you to color the gauges based on the % of vm's in critical state out of the total number of vms.
	PercentHighSumRdyVm	vm	Critical > 75 Warning > 50	This is a Splunk metric.The threshold is implemented on top of SumRdy_ms. Used on the home_proactive_monitoring dashboard to give a warning / critical level of vms that are in a "critical" state. This allows you to color the gauges based on the % of vm's in critical state out of the total number of vms.
	VMinvCpuMaxUsg	vm	Critical > 90 Warning > 75	This is a Splunk metric.This threshold is based on the max cpu that the host can give a vm. It is not the max of the reservations. If the vm is >= 100%, the vm is requesting more cpu then the host can allocate.
	VMinvMemMaxUsg	vm	Critical > 90 Warning > 75	This is a Splunk metric.This is the a threshold that's based on the max mem that the host could give a vm. Not the max of the reservations. If the vm is >= 100%, the vm is requesting more mem then the host can allocate.
	PercentHighBalloonHosts	Host	Critical > 75 Warning > 50	This is a Splunk metric. This threshold is a threshold on top of BalloonedMemory_MB. Used on the home_proactive_monitoring dashboard to give a warning / critical level of hosts that are in a "critical" state. This allows you to color the gauges based on the % of hosts in critical state out of the total number of hosts.
	PercentHighSwapHosts	Host	Critical > 75 Warning > 50	This is a Splunk metric. This threshold is a threshold on top of SwappedMemory_MB. Used on the home_proactive_monitoring dashboard to give a warning / critical level of hosts that are in a "critical" state. This allows you to color the gauges based on the % of hosts in critical state out of the total number of vms.
	PercentHighCPUHosts	Host	Critical > 75 Warning > 50	This is a Splunk metric. This threshold is a threshold on top of AvgUsg_pct. Used on the home_proactive_monitoring dashboard to give a warning / critical level of vms that are in a "critical" state. This allows you to color the gauges based on the % of hosts in critical state out of the total number of hosts.
	BalloonedMemory_MB (balloonedMemory)	Host	Critical >= 10 Warning >= 2	This metric belongs to the VMware VirtualMachineQuickStats object type. Pulled from inventory data based on the reported vms that exist on the host at the time of collection. The threshold is based on the total amount of memory in MB that is reclaimed from all of the vms on that host.
	SwappedMemory_MB (swappedMemory)	Host	Critical > 5 Warning > 0	This metric belongs to the VMware VirtualMachineQuickStats object type. Pulled from inventory data based on the reported vms that exist on the host at the time of collection. The threshold is based on the total amount of memory in MB that is being swapped from all vms on that host.
	RemainingCapacity_GB	Datastore	Critical <= 50 Warning <= 100	This is a Splunk metric. Changes state based on the remaining disk space in gigabytes on a datastore.
	Overprovisioned_GB	Datastore	Critical > 95 Warning > 75	This is a Splunk metric. Changes state based on how much space is over-provisioned in gigabytes. Negative numbers are a representation of an under-provisioned datastore.

Complete list of VMware performance metrics

By default, Splunk App for VMware collects the complete set of VMware performance metrics. You can view the complete set of metrics in Splunk App for VMware on the Proactive Monitoring dashboard and on the Performance of Hosts and VMs dashboard.

You can read about the performance metrics in the VMware Inc. documentation. Performance metrics are organized by VMware into the following categories.

Cluster Services metrics. See http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/cluster_services_counters.html
Cpu metrics. See http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/cpu_counters.html
Datastore metrics. See http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.wssdk.apiref.doc_50%2Fdatastore_counters.html
Hbr metrics. See http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.wssdk.apiref.doc_50%2Fhbr_counters.html
Management agent metrics. See http://pubs.vmware.com/vsp40_e/wwhelp/wwhimpl/js/html/wwhelp.htm#context=admin&file=r_mgmt_agent_counters.html
Memory metrics. See http://www.vmware.com/support/developer/vc-sdk/visdk400pubs/ReferenceGuide/memory_counters.html
Virtual machine operation (Vmop) metrics. See http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.bsa.doc_40/vc_admin_guide/performance_metrics/r_vmop_counters.html
Disk metrics
Power metrics
Rescpu metrics
storageAdapter metrics
storagePath metrics
Sys metrics
vcDebugInfo metrics
vcResources metrics
virtualDisk metrics

Related answers from Splunk Community

Add, edit, and delete threshold settings

Manage threshold settings

How to use thresholds

Performance metrics reference table

cpu

mem

net

disk

inv

Complete list of VMware performance metrics

Comments

Add, edit, and delete threshold settings

Was this topic useful?