In the process of helping others understand and manage the capacity of their virtual infrastructures, I’ve consistently noticed how high availability (HA) admission control policies can cause problems – or fail to prevent them – if not configured properly.
Many times it starts out with the default admission control policy, the cluster being configured to tolerate a single host failure. From there, if the slot size being used is too extreme, we find ourselves unable to power on the guests that we know we should have capacity for. To remedy this, some of us simply disable admission control. Then, months later, we find out that we let our cluster(s) get committed past the point of failover viability. Not fun.
Even though I’ve seen that story many times, there’s a solution. While attending his VMworld 2012 breakout session, "Avoiding the 19 Biggest HA & DRS Configuration Mistakes", vExpert Greg Shields reminded me that the recommended admission control policy for high availability (HA) is the “percentage of cluster resources reserved as failover spare capacity.”
This policy doesn’t use the sometimes excessively large slot sizes that many of us might be accustomed to while using the default “host failures” policy. Second, it is more flexible, allowing us to build our own custom thresholds for capacity reservations.
If this is something you want to try out, here is a basic formula to get you started. Where pis the target percentage for the admission control policy, h is the number of hosts in the cluster, and f is the desired number hosts failures you want the cluster to be able to tolerate:
While this formula is a good rule of thumb, there are a few additional things you should consider:
- If your hosts are not all evenly configured with memory and CPU resources, then the above formula will understate what you need to reserve. Adjust accordingly.
- If you have any abnormally or excessively large guests in your cluster, make sure that the reservation percentage leaves enough room for those kind of guests to find a new host in the event of a host failure. In other words, if you have virtual server that uses 32 GB of memory, make sure there’s always another 32 GB of memory available on another host.
- If you increase/decrease the number of hosts in your cluster or do a hardware refresh, you will probably want to revisit your percentage calculation to make sure your HA will be able to work as intended – or that you’re not reserving too muchcapacity.
Sharing is Caring
As always, I hope this has been useful. If you have other tips, tricks, or habits that haven’t been mentioned, please drop them in the comments below.