The Backup Experience: In-line vs. Post Process Deduplication

Purpose Built Backup to disk Appliances (PBBA’s) are designed to reduce backup storage investments, improve backup performance, simplify backup environments, and offer improved resiliency technology to keep protected data pristine.  Many vendors such as Quest, Dell EMC, HPE, ExaGrid, Quantum, etc. offer PBBA’s with powerful backup technologies along with backup software integration to further improve the backup experience.  Each vendor offers product differentiators including deduplication implementation variances that can impact backup capabilities. This blog examines a highly debated topic - How in-line deduplication and post process deduplication effects backup solutions.

 

Post-Process & In-Line Deduplication

Post-process deduplication PBBA ingests backup data and immediately stores it on an area of disk called a ‘landing zone’.  Once data has been written into the landing zone, a deduplication process is applied resulting in a deduplicated copy of the data that is stored outside of the landing zone.  The purpose of a landing zone is to improve the performance of restoring recently preserved data.  The post-process deduplication method is shown in Figure-1.

 

An in-line deduplication PBBA immediately deduplicates data in memory before it is written to disk.  The in-line deduplication method is shown in Figure-2.  Both processes appear to be similar, but differ in when the deduplication process is applied. 

Backup Performance

One may think that post-process PBBA’s result with superior backup speeds because backup data is written straight to disk (the deduplication process is bypassed to a later time), whereas in-line deduplication applies deduplication during the backup process before data is written to disk.   This thinking is logical as there is no deduplication or other processing that might be getting in the way slowing down backup performance.  But in-line PBBA’s poses performance advantages as well - by positioning the deduplication process to occur before data is written to disk, the load on disk is reduced up to 90%.  When comparing maximum performance specifications between various in-line and post-process deduplication PBBA’s, one can find little performance differences between comparable appliances.

 

 

Deduplication - Improving Backup Performance Bottlenecks

Backup performance issues are typically not bottlenecked at a PBBA, instead the most common backup performance bottlenecks occur at the production network which in-line deduplication can help solve.  For example, the throughput of a 1GbE production network (LAN) is limited to ~ 0.439 TB/hr.  In this case, it takes at least 11.5 hours to protect 5TB of data, or at most 17.5TB of data can be backed up during a 48-hour weekend.  For many environments, this level of performance is unacceptable even when multiple links are working in parallel.  A PBBA may tout fast backups, but the appliance is only as fast as the amount of data the network can deliver to it.

 

In-line deduplication methods are popular because they directly address network backup performance issues by offloading redundant backup traffic (up to 90%) from the production network.

 

In-line deduplication PBBAs directly address network bottlenecks by implementing source side deduplication which moves the deduplication process up stream to the protected server as shown in Figure 3 & 4.  This way, redundant backup data is eliminated before it is sent over the network to the PBBA resulting with exceptional backup performance.  Quest Labs tests show that a DR6300 ingests up to 29TB/hr using 2 x 10GbE network interface cards when source side deduplication is used.  For Quest DR Series appliances, the same variable block sliding window deduplication process is used for source side deduplication.

Post-process PBBA’s are not designed for source side deduplication and generally do not support it because the landing zone is intended for non-deduplicated data as shown in Figure 1.  Thus alternatively, source side deduplication from a backup software solutions could be used, which may possibly lead to additional licensing costs.   In other words, two deduplication technologies from two different vendors are needed which adds complexity to a backup solution.

 

Cost - $/GB

In-line deduplication PBBA’s offer best backup capacity savings results as only unique backup data is ever saved to disk.  As a result, in-line deduplication PBBA’s can preserve great amounts of backup capacity with minimal disk investments, power, cooling and management.

 

Post-process deduplication disk resources are configured with a landing zone to quickly absorb the latest complete full back with all data redundancy included.  Because redundant data is saved onto disk, the deduplication results of a post-process PBBA are significantly diminished.  Generally, 40-50% of disk resources can be consumed by a post-process landing zone which demands higher disk costs.

 

Restore Performance

An in-line deduplication PBBA must rehydrate data for backup recovery requests as all data is stored in deduplicated form.   The rehydration process is resource intensive as data is reconstructed from its deduplicated form back into its native form.  The rehydration process lengthens the time required to recover data.

 

Post Process PBBA’s offer excellent performance for recovery requests of recently preserved data.   But recall, not all data is preserved in the landing zone, only the latest or most recent data is preserved here.  Over time, as new data is ingested into the landing zone, the new data replaces the aged data.   When restoration of aged data is requested, the same intensive rehydration process as used within in-line deduplication PBBA’s is necessary to reassemble this data back into its native form.

 

Which is better? A PBBA offering fast recent data recovery performance at the cost of disk capacity investments or slower recovery performance with the ability to maximize backup capacity?   Each customer have different needs and will offer a different answer.   To help determine a best answer, one question that can be asked would be:

“Is restoring recently backed up data at fast speeds worth doubling the cost of disk?”

Or

“Is the value of rapid restoration of recently protected data worth the missed opportunity of a 400% increase in backup performance along with a 100% increase in backup capacities?”

If the ability to quickly restore recently protected data (not aged) is a requirement for your environment, the answer may be ‘Yes’.   If not, an in-line deduplication PBBA will typically be more cost effective.

 

Replication

Creating remote copies of backup data has become affordable and achievable for most backup environments largely because of deduplication technologies.   Most PBBA’s are able to replicate backup data in deduplicated and encrypted form under one management console which makes them very attractive.

 

Replication of post process deduplication PBBA’s data is delayed until the backup data within the landing zone has been deduplicated at a later point in time.    

Additionally, since backup processing is typically a priority, the deduplication process may be throttled or even paused further delaying the replication of data. Thus, replication directly depends on the timing and performance of the deduplication process.

 

The replication process for in-line deduplication PBBA’s is predictable and easy to manage as the deduplication of data is never delayed, but immediately executed before writing to disk. 

 

Scalability

Over time, most backup environments experience an increase in demand of concurrent backups, additional replication policies, and greater amounts of data to protect, etc., which may require additional processing resources than the PBBA can provide.   If this is the case, additional PBBA’s may be required to meet the growth and demands of the backup environment.

 

But, in-line deduplication PBBA’s offer an alternative to additional PBBA investments as source side deduplication can be used to scale processor and memory resources above and beyond that of the PBBA; by using the processing and memory resources of the protected server as shown in Figure 4.  For example, if a Quest DR4300 Series appliance has reached its processing and memory resource limits, a new DR4300 appliance does not necessarily need to be purchased.  Instead, the implementation of source side deduplication where the total of the client processing and memory become greater than that of the DR appliance can meet these processing demands.

 

Post-process deduplication PBBA’s do not have the ability to scale processor and memory resources beyond that of the PBBA as source side deduplication is not supported.   To increase processing resources, additional PBBA investments are typically required.

 

For both in-line and post-process deduplication PBBA’s, additional disk capacity is increased with additional PBBA or disk shelf investments.

 

Summary

Both post-process and in-line deduplication PBBA’s provide backup environments with enhanced backup performance, best replication of backup data experience, significantly reduced disk based backup investments, power, cooling and management.  Yet, both offer advantages, disadvantages and variances that will cater to different customer needs.  

 

The Quest DR Series PBBA family offers in-line deduplication technologies to provide cost effective processing scaling, removal of backup network bottlenecks and best backup capacities with minimal disk investments.

 

For further questions about the Quest DR Series appliance family, contact a Quest backup and recovery expert for details.

 

 

Helpful Links:

DR Series PBBA’s

Backup Deduplication Explained

Quest DR Series Appliances with Veeam Instant Recovery

Anonymous