How a Poorly-Designed Architecture for Data Backup will Undermine a Virtual Environment -- A Close Look at Veeam by Jason Mattox

At Vizioncore, we do not often cite our competitors by name in public. Our philosophy is that it is our job to provide expertise on virtual management requirements and the capabilities offered by Vizioncore for addressing those requirements.

However, members of our team also do have occassion to look in depth at competitive products. When the result is a fact-based assessment of how a competitor's approach contrasts with Vizioncore, it seems to serve the larger community to put the information in public. The purpose of this is to help members of the community to better understand the real differences in approach and to better appreciate the value built into the Vizioncore product portfolio.

In this case, the competitor that we looked at is a small company called Veeam. Veeam offers an all-in-one product for backup, replication and recovery of VM images. They are privately held, based in Russia, and report having about 6K customers as of now. This compares to Vizioncore's 20K+ customer level as announced in March 2010, with Vizioncore operating as a wholly-owned subsidiary of Quest Software. Quest is a public company, obligated to provide audited reporting of company financials.

The comparison and contrast between Veeam's implementation of image-based backup and restore and vRanger Pro 4.5 appears below. This analysis is written by Jason Mattox, one of the co-inventors of the original vRanger Pro product. Jason continues to provide guidance and direction to new versions of Vizioncore technology products, including vRanger Pro 4.5.

We hope that you find Jason's comments and insights educational. In a part 2 of this posting, we will offer more details on how vRanger Pro 4.5 and the Data Protection Platform (DPP) foundation on which it is built, contrasts with Veeam's approach.

**************************************************

I have had the opportunity to take a deep, first-hand look at the Veeam 4.1.1 product over the last few weeks. My personal opinion - admittedly biased - is that they have a product built on a poor foundation. The problems with their architecture - and the potential result of the data protection not operating well and actually undermining an organization's virtual environment - include the following:

 

 

Psuedo service-based architecture: You install the product, it installs itself as a service, and you think, “okay good it’s a service based architecture.” But it's not: Here is a simple test you can do on your own to prove the product is not a full service-based architecture. Start a restore job. Then log off Windows; the product will ask you: "Are you sure?” This is because if you log off Windows, it will cancel your running restores since it’s not running though the service. Another test you can try, is to attempt backup and restore at the same time; you cannot. If the product was a true service based architecture, your backup and restore jobs would just be submitted to the service and the product wouldn’t care about the two functions running at the same time.

 

Lack of data integrity in the backup archive: Create a job in their product that contains, for example, 20 VMs. When you back up all the VMDKs, then Veeam puts all of the backup data into a single file. Also, when you run incremental backups, they update this single large file directly.

 

When you have a single large file that needs to be updated the chance of corruption is high. Database manufacturers know this; products like SQL and Exchange write all their changes to log files first and then on a controlled event they post the changes to the single large file, the DB. Veeam does not implement this best practice, but rather updates a single 30, 40 or even 500 GB file directly instead of staging the data next to the file, then posting the data to the file once successful.

 

 

This is their Synthetic Full implementation - the entire basis for their product - and why we object to it so strenuously in terms of the risk that it introduces into customer environments.

 

Their argument in favor of Synthetic Full appears to have been that it enables backups to be faster. We believe that there are other, better methods available for speeding backup which do not risk the integrity of the backup repository. Methods including Active Block Mapping (ABM), now shipping in vRanger Pro 4.5. In beta test environments, our testers have reported that vRanger Pro backup is far faster than Veeam. However, your mileage will vary and we welcome more test reports from organizations testing both products.

 

Another argument in favor of Synthetic Full which has been offered by Veeam, is that it helps speed restore. Again, we agree with the goal but not with the method used to get there.

 

 

In vRanger Pro, we offer a Synthetic Restore process which has been in the product for some time. Our restore has been faster than Veeam's for as long as we've been aware of Veeam. Our performance on restore was also improved in the 4.5 release, to be even faster than before.

 

 

Problems with updating a single file in the backup repository: Those of you familiar with database implementations - and the very good reasons for staging updates rather than writing them directly - will understand some of these problems immediately. This approach is especially problematic for image-based backup, and I'd like to offer some reasons as to why:

 

  • Tape space requirements - Because the original file is updated with every backup pass, the entire file must be written to tape every time. There is no method offered for moving just the new data to tape. This makes the 'sweep-to-tape' process lengthy, and increases the number of tape cartridges required significantly. Tape management is, likewise, more difficult. The process of locating tapes, scanning to find data, and performing restore is, likewise, more difficult and lengthy.

 

  • Problems working with Data Domain de-duplication storage and similar storage appliances - Because the original file is amended with every backup pass, the appliance cannot be efficient in de-duplicating and replicating the backup data.

 

  • Finding and restoring individual VMs from the backup job - Because the backup file includes more than one VM, it is not named intuitively to enable easy browse and restore of the VMs required by the admin.

 

 

Overhead in the process of creating and managing simultaneous backup and recovery jobs - it's just harder to do: In their product, if you create a single backup job of let’s say 30 VMs they will backup one VM at a time. To perform more backup jobs at one time, you must create more jobs. For each job, you must step through the entire backup wizard, which is time-consuming. The same holds true for restore jobs: for each VM, you must step through the entire restore wizard to create and submit a job to restore a VM. This isn't that bad most of the time, but for disaster recovery scenarios or in situations in which entire ESX servers must be rebuilt, this simply isn't that practical.

 

 

Feature called de-dupe is really something else: De-dupe in their product is not true de-dupe, but is perhaps better described as a template-based backup. Here's what they do: they define a base VM - this being a set of typical files or blocks typically found in a VMDK - and they use this as the comparison for their full backup of the VM. For example, if you have two Windows guests then they do not have to backup up the Windows configuration because it is already in the base VM template.

 

However, there are some important limitations of their approach which include:

 

  • Their de-dupe is only good within the backup job. The more jobs you create, the less beneficial the de-dupe is because blocks are duplicated between and among backup jobs. If you need to create more backup jobs to gain better backup performance across multiple VMs, then the de-duplication benefit goes down.

 

  • Their de-dupe is defined with a base VM - and does not change with the configuration of the guests. If you have two Exchange servers being protected in the same job, then all of the blocks for the Exchange configuration will be included twice - even if they are identical.

Our own implementation of de-duplication is pending delivery later this year. We have developed true global, inline deduplication designed to offer maximum de-duplication benefits. It's in test now. Our architecture, which includes keeping backup files intact and untouched once written into the repository, has been a key in enabling our de-duplication to function with true de-dupe capabilities.

 

 

Lack of platform scalability: To scale out their product in virtual application mode, LAN free mode, ESXi network or ESXi replication, they have to install their product many times. To make it possible to manage all of the deployments, they offer an API layer and provide a ASP.net web page so that customers can go to check job status for their many installs. This console does not allow you do create or edit jobs, but is a monitor. They call this their Enterprise console.

 

 

ESXi replication is network-exhaustive: In their implementation for ESXi, their product reads from the vStorage API over the network uncompressed to their console, then it writes the resulting data over to the other target ESXi host over the vStorage API uncompressed.

 

What’s wrong with this? In the first place, the vStorage API was not designed for the WAN link; it was designed for backups which were meant for the LAN. The other issue is that the traffic is uncompressed; WAN links are not cheap so compression is a key feature that’s needed. Also, if you look at the resources needed for this, just a single replication job can consume 50-80% CPU of a 2 CPU VM. So if you think about how you would scale this out from a bandwidth and installation point of view, this doesn't seem practical.

 

Use of unpublished VMware API calls: If you have ever used the Datastore browser from the vCenter client, this process uses an internal API that’s not exposed to 3rd parties called the NFC. Here is what they have done: they are impersonating the vCenter client and using the internal NFC API to work with VMs.

 

So, here's the risk: VMware may trace a reported problem with a VM back to a 3rd party product that is using an unpublished vCenter API by impersonating the vCenter client. Will VMware be okay with this? Might VMware get a little strange with you and their ability to support you and your environment?

 

If you want to verify this for yourself just look in the logs of their product for ”NFC”, look at the target datastore for files that are not VMware related. Ask them how do they transfer and modified files in the datastore that are not your normal VMware files?

 

 

Why the stakes are high in virtual environments: Virtual environments are some of the fastest-growing and most dynamic environments in the world. As virtual servers continue to gain momentum in terms of their adoption rate, administrators are presented with the big challenge of keeping ever-expanding virtual resources monitored and under efficient management. At Vizioncore, we want to enable this momentum to continue by offering data protection and management capabilities which are purpose-built for images, with foundational capabilities designed to ensure that protection methods are -- and remain -- affordable, resource-efficient, and easy to use and operate. No matter how large your virtual environment grows.

 

 

Check in tomorrow to see Part 2 in this posting, with details on Vizioncore's foundational architecture and how it future-proofs a virtual environment.

Anonymous