VMware Platform is Missing a Critical Lock Mechanism for Working with Images

Much has been made this year of the problems with VMware's VSS implementation. The "story" seemed to break in Jan when a detailed description of the problem appeared at this blog site. In fact, as organizations deploy Microsoft Exchange and other applications in Windows 2008 guests, this problem becomes more critical - which is why Vizioncore is releasing our own VSS driver to fix this problem later this month. Stay tuned for that exciting announcement!

However, there is a missing capability in VMware's implementation which has gone unnoticed - or at least unreported, as far as I have been able to discern. VMware lacks an API for locking a VMDK BEFORE the VSS snapshot is taken.

Why is this necessary? Doesn't VSS lock the VMDK and make it available for management operations that include backup, replication, optimization and so forth?

The answer is that VSS locking is sufficient -- as long as you only need to perform a single management operation on the image. But that's not realistic. The reality is that most organizations need to perform MORE THAN ONE management operation on the image. The most common combination is backup and replication.

To overcome this gap, Vizioncore has implemented its image-based management solutions to be mutually aware. This prevents vRanger Pro, for example, from attempting backup of a particular VM while it is being replicated by vReplicator. This prevents vControl from attempting to modify or move the image while vRanger Pro is performing a backup. In fact, Vizioncore has implemented vControl, vOptimizer, vRanger Pro and vReplicator to all be mutually aware so that collisions in managing VM images do not occur.

The Mechanism that Vizioncore uses to Prevent Image Management Collisions is a Simple Lock

Here is a picture of how the Vizioncore lock mechanism works:

As the first step, the Vizioncore solution first queries to see if another Vizioncore product has locked the VM. If no lock is present, then it writes a lock into the VMFS. The lock includes meta data about the process, including which Vizioncore product locked the VM and when. Then, the process continues by triggering VSS to queisce any databases before the VMware snapshot releases the VMDK file. [If you want more detail on how the VSS snapshot process works to release the VM image, you can check out my previous blog entry on the topic here.

If a lock is already present on the VM then the process waits for the lock to expire before continuing. The lock is removed by the process which put it in place after the job has completed.

Vizioncore has had this lock mechanism in place since almost the beginning. Vizioncore recognized early on that different operations attempting to work with the same VM image would potentially collide and fail. This simple lock mechanism has been remarkably effective at preventing these types of collisions.

Vizioncore has also evolved error-handling mechanisms for things like cleaning up stale locks. Our solutions now include subtleties such as ensuring that a lock on an image being replicated extends to the replica on the target virtual server.

The net benefits to organizations using Vizioncore image-based data management is the flexibility to use these solutions in all types of combinations depending on their requirements. Some popular examples include:

  • Using vRanger Pro to backup the same image which is being replicated by vReplicator to another system or site
  • Using vReplicator to centralize VM image replicas from many systems and sites for backup with vRanger Pro performed on the replicas at the central location
  • Using vOptimizer Pro to right-size VMs with over-allocated storage before performing vRanger Pro backup, which makes backup more efficient
  • Using vControl to manage VMs before performing vRanger Pro backup and vReplicator replication to protect newly provisioned VMs

A Lock Mechanism is Key for Enabling Multiple Types of Image Management to Work Together

Vizioncore offers a lock mechanism because VMware does not. Vizioncore will continue to use this lock mechanism in our upcoming vRanger Pro 4.5, and the coming integrated solution which will combine vRanger Pro and vReplicator later this summer. As the solutions are integrated, the internal APIs will be used to make this process more seamless. But, we also will support the new versions of vRanger Pro and vReplicator working in combination with older versions. So maintaining the lock mechanism is a key requirement.

Vizioncore also welcomes the addition of a VMware platform lock mechanism open to all vendors, so that organizations could have better options for combining the solutions from multiple vendors in the same environment. Until then, however, administrators should beware: selecting one portion of a portfolio from a image-based data management platform is the same as selecting ALL products from that vendor.