Null is Good: VM Performance Metrics That Should Remain Value-Less

As we were composing our Top 20 VM Performance Metrics You Should Care About white paper with our engineering team, we noticed a not-so-subtle recurring theme with some key metrics: null is good. If any of the metrics listed below have a value greater than zero, stop what you’re doing and start investigating, because something in the virtual infrastructure is not right.

The data to evaluate these metrics can be accessed from VMWare’s vCenter and are the raw material that VKernel’s Capacity Management Suiteanalyzes to spot performance issues and recommend solutions that solve these problems:

 

CPU
cpu.ready.summation – A non-zero value in this metric means that CPU Ready exists in a VM. CPU Ready is the time that a virtual CPU needs to wait for a physical CPU for processing power. As processes wait their turn, VM performance goes downhill.

 

Disk
disk.busResets.summation – A non-zero value in this metric indicates that a bus reset has occurred in a disk. A bus reset is when all commands that have been queued up in an HBA or Disk Bus have been wiped out. When the VM doesn’t get back the responses it was expecting from commands that were sent out, VM performance problems ensue.

disk.commandsAborted.summation - This metric shows the number of times a request was sent to a disk and the command was aborted. As with any computer action that includes the word “abort”, this issue will affect performance in a bad way.

disk.queueLatency.average – If there is a value in this metric, then queue latency to a disk is occurring. Queue latency is the amount of time that a process needs to wait before accessing a disk. As commands wait, VM performance will slow down and this metric indicates that disk performance issues are present in an environment. If queue latency is present, disk.totalLatency.average, the metric that shows latency in a disk, should also be examined for values past 50,000ms.
 
Memory
mem.swapin.average, mem.swapout.average and mem.swapped.average – We saved the best for last: A non-zero value in any of these three memory metrics indicates that memory swapping is occurring in a VM. Memory swapping is a computer action that occurs when a VM does not have enough memory, and a chunk of what is currently in the memory is sent to the disk for storage so that memory is freed up for the VM to work with. If a VM then needs something from the data that got send to disk, it requests the data back and that chunk of data is taken from the disk and replaced in the memory while something else is swapped out. The amount of time it takes for:
  1. A command to make it to a disk
  2. Then be read or written by the disk and
  3. A response then sent back from the disk to the VM
...will dramatically slow down VM performance by an order of magnitude of 3 to 5 times! And then it can get worse: memory swapping can choke off disk throughput and can overload disks which causes performance issues in other areas.

 

These metrics can be easily scanned for non-zero values to quickly reveal if an environment is facing performance issues. To analyze all 20 metrics in your environment in 20 minutes, download and trial our Capacity Management Suite. What you find may surprise you.
About the Author