The Capacity Management Challenge
Capacity management in a virtualized environment is a balancing act between performance and cost savings. The cost savings derive from the ability to run multiple virtual machines on each physical server. But because the VMs compete with one another for the server’s finite resources, application performance can degrade when the environment is not configured well. When capacity is not being managed and monitored appropriately, many administrators err on the side of caution, causing an unnecessary sacrifice of cost-efficiency.
Figure 1 shows the results of a study conducted of a half-million virtual machines across 2,500 virtualized environments. As shown, the greater the number of hosts, the fewer the number of VMs per host. The number of VMs drops precipitously at first, and then stabilizes at around half of that achievable on a small scale.
Figure 1 – Benefits Diminish as Virtualization Scales
Consider just a modest capacity management goal of increasing the number of VMs per host from 10 to 12. That may seem trivial, but it represents a 20% better return on the investment in hosts. And what if a large-scale environment could be managed just as effectively as a small one? That would enable nearly double the number of VMs running on existing hosts.
By following the capacity management guidelines outlined below, VM administrators should easily be able to achieve the modest 20% improvement, and with the techniques, might be able to realize a 50% improvement or more—all while maintaining satisfactory levels of performance and honoring all service level agreements.
Capacity Management in Six Steps
The science of capacity management involves a sequential workflow with six steps, as shown in Figure 2. Although each step is relatively straightforward, it is important to follow them in the order shown, as each builds on the results of the previous ones. It is also necessary to repeat the entire process periodically, especially after the virtualized infrastructure and/or application workload change.
Figure 2 – Capacity Management Workflow
What follows is an example of the capacity management workflow for a cluster of hosts with HA (high availability) enabled. The workflow operates in a similar fashion for other use cases, including for a single host. The HA use case is employed here because it is one of the more difficult ones in capacity management.
- Determine Total Capacity – The cluster in this example consists of four hosts in an HA configuration. To be able to recover from the complete failure of any single host, the available total capacity should be based on only three of the hosts. In other words, only 75% of the resources in the cluster should be considered to be available for the total capacity calculation. Note that this assumes all hosts have the identical configuration .In clusters with hosts of varying sizes, it is important to account for the case where the largest one fails.
- Calculate Usable Capacity – Having a capacity buffer is prudent to avoid performance problems during periods of peak workloads. For an HA cluster, owing to the built-in “buffer” from the equivalent of an additional or “spare” host, a 5-15% capacity buffer should be adequate. For individual hosts and non-HA clusters, a slightly larger buffer of 15-20% is recommended.
- Find Peak Utilization – The critical consideration in this step is to use a period of when a peak workload will actually occur. If that peak will occur beyond the planning horizon (e.g. at the end of a quarter or fiscal year), it may be necessary to make an estimate of the peak resource utilization. It is also necessary to determine peak utilization for all VM resources, including memory, CPU, storage I/O, storage space and the network. Of these five shared resources, memory is the one that is normally the most constrained—and the most difficult to assess.
- Determine Leftover Capacity – This simple calculation reveals unused capacity available for other VMs. Leftover capacity is determined by simply subtracting peak utilization from physical capacity for each resource.
- Calculate Average VM Size – In most virtualized data centers, it may be necessary to use different averages for different types of applications, such as database, virtual desktop, email, etc. For similar applications, use existing configurations that deliver satisfactory levels of performance to calculate the average VM size needed. Sizing VMs is a whole conversation by itself and is not within the scope of this article. However, it should be noted that VM sizing is a crucial part of capacity management and is necessary for achieving ideal consolidation ratios.
- Compute New VM Capacity – This simple calculation reveals the number of additional VMs that can fit in the leftover capacity. It is determined by simply dividing the leftover capacity by the average VM size.
In this simple four-host example, the six-step capacity management workflow is not at all arduous. But what about an environment with 50 or 100 hosts? Owing to the complexity of managing capacity on a larger scale, it will be necessary to consider another factor: how frequently the environment changes. In relatively static environments, reassessing VM capacity can be performed only occasionally, perhaps only once a year during budgeting. But in dynamic environments, it may be necessary to reassess capacity on a monthly or even a weekly basis.
Sharing is Caring
Hopefully, these six steps will help you safely increase the consolidation ratios on your hosts. If there are additional capacity management tips and strategies that you use, share them in the comments below.