This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Best Practice for site A Fail Back after DR Excersize at Site B with Virtual Standbys

Hello,

 

...looking for the best way to do this. Here is the setup (MS Paint still rules!)

 

For the failback operation, we do not want, nor will we have the time to create a seed and ship it back down to Site A. Only ~35GB will have changed and it is a SQL database, files and folders changed, and IIS website data changing.

 

Is there a way to fail back to the Site A source and not have to restore all 500GB? Only 35GB has changed so this could be restored overnight if there is an efficient way. These sites are across the country from each other.

Site A source is currently physical but could be changed to Virtual if needed.

 

*SiteA and SiteB are DL4000 Appliances running 6.1

  • It is possible to reverse the replication path and replicate the data back to the source core. It takes some manual work but we have a KB for that - support.software.dell.com/.../195296. This would allow you to replicate the data back to the source core. However, there is no way to incrementally update the original production server that was backed up and has now been out of use for 3 days. So you have to do a complete virtual standby of that server to get an up to date copy of it at Site A, then go through the failover process again to get back to the production site. If the original protected agent is physical, then you have to do a BMR (unless you are only restoring a data volume in which case you can just do a rollback). But either way you have to restore the original machine in order to update it with the data that was changed while it was running at Site B.
  • Thanks Tim. Wow, that really stinks. I found this article which pretty much exactly describes my situation, and it's basically the same as you said:
    support.software.dell.com/.../156009

    What is crazy is the Source Core in Site A has 465GB of good data that does not change but yet all 500GB needs to be restored still. What a huge waste - seems like a huge area of opportunity for improvement. We know the block info when shutting down SiteA, we know firing up SiteB contains the same data, SiteB is being tracked/updated and CoreB knows what has changed, but it can't apply only those changes to CoreA? They are the same "servers"...I'm sure it's more complicated than that but the basics are there. Apply changed blocks regardless of where the server is. I would think that's the whole point of a recovery chain. SQL server can replay only the changes to a different server, why can't Rapid Recovery? I digress.

    Thanks for the help.
  • if you're using agent based backups, actually you don't know the state of the source server when it shuts down, since you haven't taken a backup at that stage. As such it can't say only 35GB is different as it won't know.

    As for improving the way you restore to a physical box, that's something I've been asking for, for a long time. Basic idea is create a URC disk that allows you to perform the same as virtual standby but to that client and then just finalise / inject drivers when finished.

    Nothing yet though, don't envisage anything soon
  • I may not get this at all but isn't that what live recovery does? It sure seems like it the few times I have tried it.

    Grab a backup of the failover server and then shut it off. Turn on the primary (out of date) server and run a live recovery of the "D" drive.

    This of course assumes that nothing is on the "C" (O/S drive) but the O/S. We have a few older servers that are not set up this way so a BMR would be the only option but I plan to fix this as they are replaced.
  • Not sure how I missed this but that's what I was trying to verify above. Using MSSQL and IIS pointing both to drive D: for example, if I simply start up the outdated server at SiteA, then issue a failback of the D: drive, then a user immediately starts up our SQL in-house application to access data, will we have to wait for the ENTIRE 500GB database structure to restore, just the entire DATABASE the requested data resides on, or will it pull the block information the user is specifically asking for first? With 500GB of data needing to come down the WAN ultimately, that severely impacts our RTO. If it can pull SQL data back as it's requested at the block-level, that would be acceptable. If we have to restore an entire DB or the entire SQL database structure before being able to access our data, that will require a different recovery plan entirely.

    I know this is getting down to the nitty-gritty, but it's the most important "issue" of the "feature" called Live Recovery for us. On a LAN this is hardly an issue, but over a WAN it can make or break an entire DR plan.
  • Yes as I understand it that is exactly how it works. It puts the applications on the live recovery drive in a sort of "on demand" state so clients can be served while it is recovering in the background.

    An old AA video using an exchange DB as an example is here-

    www.youtube.com/watch
  • Great - thanks Corrigun! I searched for Rapid Recovery Live Recovery SQL and didn't find anything but it looks like they cover this in the appassure video. Nice!