Why Variable Block is Critical to Data Deduplication

When you think of data deduplication, the overall concept may seem very simple, but there is a lot going on under the covers that ultimately helps you save time and money.  The beauty of using a purpose built backup to disk appliances (such as the Quest DR6000) for deduplication is that they take care of the details and once set up, allow you to “set it and forget it”.

Let’s review what happens when data deduplication is at work.  Deduplication is so powerful because a significant percent of all data that a typical organization stores is a copy.  For example, cloning virtual machines creates multiple copies of identical operating system images. Many other essential processes create identical copies of data as well, resulting in a lot of duplicate data. And when it comes to data backup and storage, why would you want to backup multiple copies of the same data?

Dedupe reduces the amount of stored data by removing each redundant copy (or block) of data, leaving a pointer to the original data, now stored once, in its place.  Deduplication can be applied to data in primary storage, backup storage, cloud storage or data in flight for replication, such as LAN and WAN transfers. 

Important considerations

When considering deduplication solutions, a key variable to consider is the effect of the ingest rate, the rate at which the backup server can accept, process and send to storage data without falling behind. Any process added to this workflow has the potential to slow the ingest rate, and thus reduce server performance. 

When it comes to purpose-built backup to disk appliances (PBBAs), factors that influence ingest rate include processor speed, network connectivity (1Gb or 10Gb networks), the number of clients backing up data, and network bandwidth.  Many PBBAs use in-line deduplication which processes data in real time as soon as it’s received from the backup server. Because this is processor intensive, other methods are used to increase the ingest rate, including use of variable blocks and built in performance accelerators.  Variable block alters the size of the data “chunks” read and written to disk based on factors such as data type, which yields predictable performance and helps to properly size a backup appliance solution.

Variable block dedupe

Similar to fixed block dedupe, variable block dedupe can increase the block size allowed.  It accommodates changes easier than fixed block because it automatically determines the optimum chunk size based on the type of data (i.e. files, images, databases, etc.) and the formatting used by the backup application (i.e. removing extra header information).  This helps the process find natural boundaries within the data, compensate for shifts as new data is added, and still find the optimal chunk size in a repeatable manner.

In short, we really like variable block dedupe for the following reasons:

  • It accommodates changes in data easier than fixed block sizes
  • It adds intelligence to the process resulting in less storage than using single instance storage where only complete files are compared during the deduplication process.

 

Variable block deduplication is used by Dell’s DR Series (DR4100, DR6000 and DR2000v).  

Learn More >>

About the Author