DIY Capacity Management in 5 Simple Steps

Many times when I talk to people about managing capacity, they're not. Since capacity management and performance management are closely intertwined, this is somewhat of a tragedy. If you - or someone you know - happen to be in this category, here are five steps to get you started.

1. How Much Capacity Do You Have?

To keep this simple, we're going to focus on VM memory capacity, particularly because it's the #1 constraining resource for most environments. However, if you're feeling ambitious, you're welcome to total your CPU, storage, etc. Also, if you're like most people, you'll want to figure this out on a per-cluster-basis. Why? First, if you're running DRS, the need to evaluate capacity on a host basis becomes negligible. Second, if you have High Availability (HA) enabled on the cluster, calculating at the cluster level allows you to most effectively factor this in.

Running example: I have four hosts in a cluster, each with 64 GB of RAM. Total = 256 GB of memory. HA is enabled and configured with a single host failure tolerance.

2. Calculating Usable Capacity

Wise administrators try not to fill their hosts, clusters, or datastores to 100% utilization. So, how much capacity should we consider "usable"? It depends.

If you have HA enabled, the usable (reserved) capacity depends on the number of hosts as well as the desired host failure tolerance. In my running example, the cluster is configured to support a single host failure. That would mean only 192GB (75%) of the cluster capacity is actually usable - i.e. total capacity minus one host.

Running example: 256 GB (total) - 64 GB (failover host) = 192 GB (usable capacity).

Now that we've determine how much capacity is required to accommodate HA, I'm still not done. If I used 192 GB of memory across the cluster and unplugged one of my servers, the three remaining hosts would be at 100% utilization - not good. A host will begin ballooning memory at 94% utilization. If the ballooning doesn't end up being sufficient, the host will begin hypervisor swapping (much worse than OS swapping) at 96% utilization.

To retain performance levels in the event of a host failure, factor in additional, small reservation. By default, VKernel shoots for an additional 15% buffer. This allows for some wiggle room without leaving too much unused hardware on the table.

Running example: 192 GB (usable capacity) * .85 (15% buffer) = 163.2 GB (safe usable capacity)

3. Subtract What is Already Being Utilized

Out of the 163 GB (rounded) that I can safely use, I already have several VMs running. After briefly opening my vSphere Client and checking a performance graph for my cluster, I discovered that I'm already using 117 GB. This leaves me with 46 GB for new virtual machines.

Running example: 163 GB (safe usable capacity) - 117 GB (used capacity) = 46 GB (available capacity)

4. Calculate Your Average Virtual Machine Size

This is really easy if you already have the Capacity Manager module in vOPS Server Standard running; it'll calculate this for you. However, if you only have your vSphere Client to work with, divide the amount of used capacity by the number of VMs that you have powered on. In this environment, I'm running 41 virtual machines.

Running example: 117 GB (used capacity) / 41 (powered-on VMs) = 2.85 GB (average VM size)

5. Calculate Additional VMs and Compare

We now have all the numbers we need to finish this little adventure. The last step is to divide the amount "safe usable capacity" by the "average VM size" and see what you get. Now, keep in mind that this is only a rough calculation, but it should be in the same ballpark as what you see in vOPS Server Standard.

Running example: 46 GB (safe usable capacity) / 2.85 GB (average VM size) = 16 (additional VM capacity)