Rapid Recovery VM in Azure - To seed or not to seed!


I'm in the planning phase of migrating my off site replication cores to Azure from our data center and I'm curious if anybody has any real world experience with seeding they would like to share.

Whether we add new cores with initial base images or replicate existing cores with base images and lots of incremental's there will be a lot of data to send to Azure so we are wondering about replication speeds/time of replicating versus shipping drives.

There is obviously a ton of variables and I know our experience will not exactly match anybody else but I'm curious what speeds people are seeing when replicating base images. For example, 3tb's of data behind a 100mbs fiber link. Are we talking hours or days or weeks for that to replicate to Azure.

And how has the experience been with sending drives? What type of turn around are you seeing once Microsoft receives the drive? I have one client in particular that can't seem to figure out their UPS and there server shuts down dirty every couple of months. New base images are generated so seed drives could be moving back and forth a lot if we can't let those replicate on their own.

As I said I know that no two experiences are the same but would really appreciate any experiences anybody might share.

Thank you

  • Before I give my feedback, look into increasing your dedup cache on the core. If sized properly, its entire function is to stop cores from replicating huge backups (bases) when they happen. So even if the core is forced to take a base (you cant stop that) the core can still only replicate an INC (not exactly correct but you get the idea)

    I would guess seed drives are going to be faster. But there are so many variables, maybe run a few small tests prior. Compare 200GB seed to a 200GB replication. Do this 3-5 times

    We have sent seed drives to Azure several times and never had an issue that I am aware of

  • I've worked with a lot of people on replication to Azure and only ever seen two of them use a seed drive. For the most part if you have a 100 Mbps line and are only moving 3 TB of raw data (pre-compression/dedupe) that should replicate fairly fast. For instance, if you look at the actual used repository space, that's a better gauge of what has to be sent rather than looking at the total protected data for an agent. Total protected data is prior to compression and dedupe. If you aren't going to be replicating everything from a repository then you can make an educated guess using the compression ratio for your repository. For example, let's say you have a 3 TBs of backup for 1 agent and you have a 50% compression ratio. That means that RR will only have to replicate 1.5 TB of actual data. At 100 Mbps, that equates to about 35 hours of time to replicate (if RR is using the entire 100 Mbps connection). So even if that's 2 or 3 times better than the speeds you actually get, it still won't take more than a week to replicate that data.

    When seeding there are a couple things to be aware of:

    1. Microsoft requires you to send an entire hard disk (or multiple disks) to them with the data
    2. You have to archive the data to a location and then use the WAImport tool to format the disk you will send to Microsoft and copy the data to it. You cannot write directly from RR to the disk. So you need a place to park that data temporarily and then copy it to the seed drive.
    Here is the Microsoft documentation on doing the process - https://docs.microsoft.com/en-us/azure/storage/common/storage-import-export-service.
    So using the same example above, if you seed, you have to first make an archive in a specific location. That archive should be about 1.5 TB. If you are able to write it at 100 MB/s (which is all based on your repository speed and is probably a best case scenario) that would take about 5 hours. Then you have to copy that same data to the hard drive. If you get the same copy speeds now you are at 10 hours. Then you have to mail the drive. If you choose to pay for overnight shipping and the data center processes it quickly that's another day. Then you have to import the seed drive. That will take a minimum of 5 hours. If everything goes exactly perfect you're looking at 2 days minimum. If we add the same 2-3x factor of error then you are looking at a week also.
    What this says to me is that if you have a stable network connection, it's about as fast and WAY less work to replicate and not seed. That's just some back-of-the-napkin math, but it matches up with my experience with Azure and the people I've worked with that are using it.  I hope that helps.
  • Thanks for the information. Sorry for the late response, I hadn't noticed people had replied and forgot I posted this here.

    Thanks again