As I was reading through the latest VKernel whitepaper release, Resolving VMware vSphere's Six Biggest Performance Issues, by Greg Shields, it occurred to me that there are some key factors that make gaining visibility into the virtual environments difficult.
For starters, virtualization makes an IT infrastructure more complex. As Shields states:
“vSphere is full of moving parts, deep integrations, and elements that sometimes do and sometimes don’t impact each other, which makes keeping it running with best performance a never-ending exercise in capacity management”
We’ve found that when data center teams begin to troubleshoot VM performance issues, they often realize how little is known about what’s going on “under the hood” of the virtualized environment. Thus, visibility into the environment is a necessity to ensure VM performance as well as to assess that sufficient capacity is on hand to sustain application growth and increased usage. Here are five key factors that obscure virtualized environment visibility:
1. Many Moving Parts In Each Layer of the Virtualization Stack
Launching an application from within a VM is a complicated task from a technical perspective. It takes a multi-tier “stack” of hardware, middleware, and software to make this achievement possible (This stack is illustrated in the diagram below).
This means that when an end user begins interacting with the application, a command must be handled by the application itself and the operating system that application is running on, which translates a set of commands over to the VM, which is in itself being supported by the hypervisor that is corralling and distributing resources from actual physical hardware. It’s this abstraction of resources that provides the flexibility and efficiency that data centers are trying to achieve with a virtualization initiative. However, in terms of gaining visibility, these multiple layers make it very difficult to assemble a holistic picture of what’s going on inside the stack.
To further complicate matters, each layer operates independently, uses different tools, and requires a unique skillset to interpret infrastructure performance. Additionally, connecting events occurring in one layer to the impacts in another can be challenging, such as gaining the insight that a usage spike in an application caused Memory Active at the VM level to increase. Being able to quickly track down these interconnected events in different parts of the virtualization stack is a requirement to gain full visibility into an environment.
2. Interconnectedness of Resource Usage
Within the Virtual Machine layer of the virtualization stack, multiple connection points exist between VMs and the resources that are used to run applications. VMs share memory, CPU, and connections to the SAN within the host, and multiple hosts are connected to the SAN. That means that an issue in one VM can affect another VM or a datastore. This is quite different from a physical server where unless a SAN was employed, each server operated within a vacuum (the more connected nature of a virtual environment is illustrated in the diagram below).
Ultimately, this increased number of connections greatly enhances the complexity of interpreting what is occurring inside an environment. For instance, when troubleshooting a VM performance issue, an event or breakdown in any one of these connections can lead to issues caused from activity in another part of the environment that appears normal from the outside. To gain visibility into the virtual environment, an understanding of all of these interconnections and how they affect the VMs and datastores they touch is necessary.
3. Constant Change and Automation In the Environment
Virtual environments are constantly rebalancing and changing. This is further accelerated through automated actions such as DRS and vMotion. In short, the dynamism which is an advantage of a virtual environment for resource allocation also makes an environment ever-changing: what an environment “looked like” last week will be very different this week, and then will have changed again next week. To gain visibility into what an environment looks like today, that infrastructure must be assessed on a real-time basis as data about the environment becomes rapidly obsolete.
4. Intense, Repeatable Data Analysis Is Necessary to “See” What’s Going On
As noted earlier, to gain visibility within an environment, every layer of the virtualization stack must be assessed simultaneously on a real-time basis. This “picture” must also take into account the interconnected nature of resource usage in a virtual environment. Insights into an environment are gained by assessing how computing processes are occurring, as reported by system metrics at each level of the virtualization stack.
Just by looking at the VM layer alone, a systems administrator quickly realizes that there is too much going on to be able to explain through “eye-balling”, or even basic data scrubbing actions such as sorting or filtering. A more intense analysis of the data is required to gain insights that allow a system administrator to come to solid conclusions. Without a proven analytic method that is quickly repeatable, a system administrator will likely find themselves in “analysis paralysis” or working off a processed “view” of the environment that took so long to assemble that it is already out of date. Noting the importance of having a real-time view, we have designed our vOPS Performance Analyzer to evaluate an environment for any vCenter alerts on a 2 minute basis.
5. Aligning the Multiple Skills, Knowledge Sets, and IT Team Members
The last major challenge in gaining visibility into an environment revolves around aligning the needed knowledge and skillsets needed to collect, assess, and make conclusions for each layer of the virtualization stack. Because issues can occur at any level, it is necessary to assemble a holistic “picture” of the environment to conclusively evaluate a system’s health. With so many moving parts, application, VM, storage, network, hardware, and database admins must be involved in constantly assessing their charges, analyzing instrumentation data to gain the insights into what’s really going on. These observations must then be combined to trace the flow of data and actions from one layer of the virtualization stack to the next.
Setting up the organizational structure and regular monitoring processes to assemble the right people and data can be challenging as well. The third part of Greg Shields’ white paper covers the daily, weekly, monthly, and yearly processes that should be put in place so that when environment insight is needed immediately, such as when combating VM performance problems, the right information can be quickly assembled and interpreted.
With the complexity of the virtual infrastructure and the various skill sets needed to gain visibility, this is a task that can be vastly aided and automated with the right tools, such as VKernel's vOperations Suite. Greg Shields provides a thorough description of what information needs to be assessed, and what processes put in place to kick off this initiative in order to understand what is actually going on even if collecting, analyzing, and getting conclusions with a specialized tool becomes automated. Stay tuned for a deeper dive into the particular areas to examine when VM performance problems strike.