How White Space & Deleted Data Affect Your Image Level Backups

Author: Jason Mattox, VP of Support & Product Management, Vizioncore


Today I’m writing about an age old problem with compressed and uncompressed backups. We now have a two phased approach to dramatically improving this issue. And if you’ve been doing image level backups for a while you might be asking yourself one of the following:


1. Why is my compressed backup size larger than the used space of the operating system? Example, 20 GB VMDK with 10 GB of used space and your backup archive size is 15 GB with compression.


2. Why does it take a long time to perform a full backup?


Let’s talk about the backup archive being larger than the used space of the VM. When Windows deletes a file, the file is never removed from the partition. If you have 10 GB of used space in a 20 GB VMDK, this means you have 10 GB of White Space for Windows to use for deleted files. Windows will keep writing Deleted data until it fills the partition; once the partition is full it will overwrite Deleted data with more Deleted data. And when you add more Actual data to the volume, the Deleted data will be over written to allow room for more Actual data. When your disk looks like the graphic below, you end up with 5 GB of Deleted data that will be included in the backup compression and possibly 2.5 GB of extra network traffic and storage on disk.

So how does this affect your backups?
When backup vendors read the full VMDK they are forced to read Actual data, Deleted data, and White Space blocks. If you’re 20 GB VM had 10 GB of data and 5 GB of Deleted data in the VM, the vendor will end up compressing the 10 GB of data, the 5 GB of Deleted data and the 5 GB of White space. This causes your backup archive size to be inflated with Deleted data on top of your Actual data. The other thing that happens is the vendor may have to spend time compressing or even de-duping Deleted data and White space.


What can you do to remove the Deleted data from the disk and remove the inflation of Deleted data from the archive?
Right now you can use VMware tools “Shrink” or Sdelete from Microsoft to write zeros over the top of the Deleted data which will allow the vendor to only compress zeros and Active data. This will shorten your backup time and remove the inflation of Deleted data from being in your archive. However, the vendor will still have to spend time compressing the 10 GB of White Space or even de-duping White Space blocks. But why even run manual tools that have to be run on an ongoing basis and that will just add extra overhead to your VMs causing a lot of writes to your disk? This is where phase two comes in.


What is vRanger Pro 4.0 DPP doing different to handle this issue?
During the first phase when we find a White Space block it will not be included in the backup. Just think about how many of your VMs have gigs and gigs of White Space and how much less time we will need to spend compressing and comparing White Space blocks. The more White Space, the better the backup and restore times verses a product that has to backup and restore White Space blocks. Now keep in mind from the outside of a VMDK Actual data and Deleted data look the same, so we can only rule out the White Space blocks in this release of vRanger Pro 4.0 DPP.


Now phase two is where we get really creative (this will be in a later version of the product scheduled to release later this year). In phase two we will be able to open a VMDK and its file system to understand what blocks of a VMDK make up Actual data, Deleted data, and White Space. What’s the difference from what I just outlined above? The process above of finding White Space blocks still requires us to read each block and check if it’s a zero block or not. The above process also forces us, to compress Deleted data when we backup. By using this VMDK and file system technique, we can remove the need to even scan the disk to find Actual data, Deleted data, or White Space. We will know in less than a second what blocks of a VMDK make up Actual data, Deleted data, and White Space and we will just compress and send the Actual data, not Deleted data or White Space.


When I test the phase one features of vRanger Pro 4.0 DPP the overall backup and restore times look great! And one of the biggest benefits is backing up to a De-dup device. Think about a 100 GB VMDK with 10 GB of Actual data and 10 GB of Deleted data. On the full backup to this device we would only send 20 GB of data, not 100 GB! Then you add uncompressed incremental’s on top of this with retention and instant file level restore, the performance you should get from vRanger Pro 4.0 DPP and a De-Dup target is going to be out of this world.




Jason Mattox