I look after our DR Appliances from a pre-sales technical side. These devices provide some real time saving benefits around backup and recovery. During this article, I will explain what they are, how they work, where they fit in, look at some of the advantages of them and highlight some points you need to watch out for.
Data deduplication is a form of data reduction, it is sometimes called single instance storage. Its software that basically searches for similar blocks of data in a stream or store. If it finds some, it writes a pointer to the data instead of actually writing the data to the physical disk, hence saving space. It’s been around for a number of years now, and has become very popular for storing server backup data on disk, in fact backup data is one of most perfect use cases.
When I was working on the coal face in IT support 15 to 20 years ago, backup and recovery was all about tape. Each server (typically a Windows NT4 file server in my case) had a dedicated tape drive installed in it. Every night a new tape was put in, the data was copied to the tape, the tape was then swapped for the next night and sent off site. The main disadvantage of this method is that if I was asked to restore a file, I would need to physically go and find the tape and put it into the tape drive, often a slow process. To get around this problem, Disk to Disk to Tape became popular, first you backed up your data to a disk array, then wrote it tape later. The disk array would be big enough for a couple of days’ worth of backups and hopefully when you came to do a restore it would be on the disk, hence speeding up the process.
Data deduplication and compression appliances and devices take this to the next level. It’s perfect for this interim backup, because the data stored in subsequent full backups will mostly be the same. The deduplication engine detects that this data is same and just writes a pointer to the data instead of actually physically writing the data again. Hence instead of just storing a couple of days of backup data on disk, suddenly you can store weeks, if not months or years, of data on the same size of physical disk. This helps cut down on the use of tapes, and cuts down on the amount of management and the time taken to actually run a restore. With more data on disk, chances are the restore can be serviced from that disk.
Some companies go one step further, our Data Deduplication appliances have a replication function built into them, they will automatically replicate the backup data to another appliance in a remote site. This gets all backup data offsite, just in-case there is a fire or a massive failure at the primary site.
So the main advantage of these data deduplication devices in your backup infrastructure is to cut down on the management of tape, and in some cases remove tape all together. This can free up the IT administrators time to work on new projects and other tasks instead of managing tapes and backup processes, as well as cut down on delivering and storing tapes offsite. They also can speed up the backups. In many cases direct to disk will be faster, and generally these devices are designed to do exactly this so you get faster throughputs. They also speed up the restores by taking the physical management of tape out of the picture.
Another bonus is how easy this can be to install and use. Our DR Appliances support all the major backup products in the industry, so they can simply slot into your existing environment with little effort and you can immediately get benefits from them. I’ve done many proof of concepts with these devices and including the physical installation, I can generally get it all up and running in a couple of hours. So the IT admins get nearly immediate results going forward.
All sounds good so far, but what are some of the pit falls of using this type of technology. It’s mainly around use-cases. In-order to work effectively these kind of devices like data to be sent to them in one continuous stream (like when you run a backup), they are not very good at lots of little reads and writes, because they have to search for common blocks of data to de-duplicate them when writing, and they have to re-hydrate the data on the reads.
One use case to avoid is using them as a standard file share and giving your user community access to them. Once the IT admins discover how well they compress and de-duplicate data, this can be very tempting to use in this way, yes you can create a Windows share on them, copy a bunch of directories to them and allow users to connect to them, but only do this as an ‘archive’ share, rather than one that gets accessed all the time with lots of reads and writes.
Some backup software also allow virtual machines to be activated and started up and run directly from the backup storage. Again, test this functionality out before committing to it in a live situation. This work load can be read and write intensive and might not perform well on these devices.
How much savings should you get? There are a number of different factors that come into play here. Some backup software have deduplication technologies built into them. Depending the math and mechanisms they use, and your own data, you get different results. With our own deduplications appliances I’ve personally seen a disk savings of around the high 80’s low 90’s percentages in the majority of customers. This is with standard, unstructured corporate data. For example, operating system files, mail, spreadsheets, word documents, data bases etc. We also enhance the deduplication in some backup software and fill some of the gaps like global level decuplication, so even if your product has deduplication functionality, this dedicated hardware devices might still be worth looking at.
Some things to note, images and videos, don’t compress well, but can be de-duplicated. Particularly situations where there are numerous edits of videos or images, where deduplication can introduce significant disk space savings. But stay away from data like CCTV video feeds or X-Ray images, basically something that is different all the time, in this case, using standard disk is your best bet.
But everyone’s data is different, you can never be sure what savings rates you will get and be wary of claims without proof specific to your environment. One easy way to find is to test. We provide a virtual appliance that runs on VMware or Hyper-V that you can download and test for free for 30 days. Its straight forward to setup, pretty much just follow the bouncing ball. You can download it from here: Data Protection | DR2000v Disk Backup Virtual Appliance
If you decide to test, make sure you do multiple (10 plus) full backups over a number of days and weeks. Compression rates can be seen immediately, high deduplication rates get better over time.
For more information DR Appliances – check out our website here: Data Protection | DR4300 Disk Backup Appliance
I hope this has been useful for you, feel free to reach out to me, if you need any more information.