Deduplication: Where to Begin?

Deduplication can be one of the most significant cost-saving applications in your backup and recovery environment.  To quote the author, Dr. Srinidhi Varadarajan: “Reducing the cost of backup and storage starts with reducing the amount of data that is backed up and stored.”

Consider this – implementing a new deduplication solution can produce a 20x reduction in data over a six month time period.

While the concept of deduplication is easy to understand – reducing the amount of data being backed up by eliminating duplicate data – some details are still common discussion fodder on our forums and among our customers. Over the next couple of weeks we will address some of those, but today we will start with WHERE deduplication can actually take place.

Deduplication CAN take place in several places in your network, but whether it SHOULD take place there depends a lot on your individual environment. Every environment is different  - that’s why planning is so important – but each deployment has its pros and cons.

But some rules of thumb:

Client-side deduplication

Client side deduplication performs deduplication at the source and by-passes any media server, delivering data directly to a storage device. This is a good scenario for bandwidth-constrained environments or agentless backup in virtual environments because of the reduced network bandwidth requirements. However, by executing on the client-side, you will be putting considerable strain on the client side machine so be prepared. This is also a good option for reducing backup windows, in some cases reducing deduplication jobs from hours to minutes.

Backup server deduplication

Backup server deduplication puts the deduplication onto a dedicated server to maximize the performance of the target device. This is also a good choice for bandwidth-constrained environments because it reduces the bandwidth requirements between backup server and storage device.  However, it does not reduce the bandwidth requirements between the client and backup server and uses additional CPU resources on the backup server.

Target side deduplication

Target-side deduplication offloads all the CPU intensive processes to the deduplication appliance, instead of sharing CPU memory and other client side resources. It also provides a common deduplication pool across the environment, allowing for high deduplication ratios and greater storage (and cost) savings. However, target-side deduplication does not significantly reduce backup windows.

Deduplication provides a cost-effective way for a business to maintain backups on disk, for longer periods of time, before staging them off to tape for extended retention. Deduplication also reduces expensive network and WAN traffic, increasing the speed at which backups can complete—a tremendous benefit for the administrator when only a few hours each night are allowed for a full backup.

 

Organisations that take advantage of the benefits of deduplication will find they are in a much better position to manage data growth and effectively optimise their data protection environment. Stay tuned to this space for more details on deduplication and purpose built appliances next week. 

Learn more >>

About the Author
Efrain.Viscarolasaga
Former PR pro, former journalist, current Product Marketing Manager for Data Protection. Adequate guitarist, less-than adequate hockey player; inadequate golfer.