As we were composing our Top 20 VM Performance Metrics You Should Care About white paper with our engineering team, we noticed a not-so-subtle recurring theme with some key metrics: null is good. If any of the metrics listed below have a value greater than zero, stop what you’re doing and start investigating, because something in the virtual infrastructure is not right.
cpu.ready.summation – A non-zero value in this metric means that CPU Ready exists in a VM. CPU Ready is the time that a virtual CPU needs to wait for a physical CPU for processing power. As processes wait their turn, VM performance goes downhill.
disk.busResets.summation – A non-zero value in this metric indicates that a bus reset has occurred in a disk. A bus reset is when all commands that have been queued up in an HBA or Disk Bus have been wiped out. When the VM doesn’t get back the responses it was expecting from commands that were sent out, VM performance problems ensue.
disk.commandsAborted.summation - This metric shows the number of times a request was sent to a disk and the command was aborted. As with any computer action that includes the word “abort”, this issue will affect performance in a bad way.
disk.queueLatency.average – If there is a value in this metric, then queue latency to a disk is occurring. Queue latency is the amount of time that a process needs to wait before accessing a disk. As commands wait, VM performance will slow down and this metric indicates that disk performance issues are present in an environment. If queue latency is present, disk.totalLatency.average, the metric that shows latency in a disk, should also be examined for values past 50,000ms.
mem.swapin.average, mem.swapout.average and mem.swapped.average – We saved the best for last: A non-zero value in any of these three memory metrics indicates that memory swapping is occurring in a VM. Memory swapping is a computer action that occurs when a VM does not have enough memory, and a chunk of what is currently in the memory is sent to the disk for storage so that memory is freed up for the VM to work with. If a VM then needs something from the data that got send to disk, it requests the data back and that chunk of data is taken from the disk and replaced in the memory while something else is swapped out. The amount of time it takes for:
- A command to make it to a disk
- Then be read or written by the disk and
- A response then sent back from the disk to the VM