In the process of helping others understand and manage the capacity of their virtual infrastructures, I’ve consistently noticed how high availability (HA) admission control policies can cause problems – or fail to prevent them – if not configured properly.
Many times it starts out with the default admission control policy, the cluster being configured to tolerate a single host failure. From there, if the slot size being used is too extreme, we find ourselves unable to power on the guests that we know we should have capacity for. To remedy this, some of us simply disable admission control. Then, months later, we find out that we let our cluster(s) get committed past the point of failover viability. Not fun.
Even though I’ve seen that story many times, there’s a solution. While attending his VMworld 2012 breakout session, "Avoiding the 19 Biggest HA & DRS Configuration Mistakes", vExpert Greg Shields reminded me that the recommended admission control policy for high availability (HA) is the “percentage of cluster resources reserved as failover spare capacity.”
This policy doesn’t use the sometimes excessively large slot sizes that many of us might be accustomed to while using the default “host failures” policy. Second, it is more flexible, allowing us to build our own custom thresholds for capacity reservations.
If this is something you want to try out, here is a basic formula to get you started. Where pis the target percentage for the admission control policy, h is the number of hosts in the cluster, and f is the desired number hosts failures you want the cluster to be able to tolerate:
While this formula is a good rule of thumb, there are a few additional things you should consider:
As always, I hope this has been useful. If you have other tips, tricks, or habits that haven’t been mentioned, please drop them in the comments below.