Chico: “You call this a barn? This looks like a stable.”
Groucho: “Well, if you look at it, it’s a barn; if you smell it, it’s a stable.”
From “Monkey Business,” by the Marx Brothers
That’s the way I feel about data recovery after an interruption: If it goes quickly, it’s just a crisis, but if it goes on for days, it’s a disaster.
Disaster recovery veteran Jon Toigo and I are not exactly Chico and Groucho, but we did collaborate recently on a webcast called Need for Speed: Data Recovery in a Non-Stop World. In the webcast, now available on demand, we discuss the speedbumps that can turn data recovery after an interruption event from a crisis into a disaster.
Here are a few of the current themes in data protection that we cover in the webcast:
First, we’re seeing companies deploy immature and still-evolving technologies like cloud storage and virtualization for things like application hosting, access delivery and data storage. Most of them are pleased with the results so far, and we don’t mind that. But the second element of the storm is a lot of propaganda and vendor hype trying to convince IT managers that high availability (HA) is the new disaster preparedness. Again, we don’t mind high availability, but it was designed for immediate failover, not for data backup and disaster recovery.
Those trends are developing against increasingly frequent disasters, both natural (storms, hurricanes, flooding) and man-made (vulnerabilities, ransomware attacks).
There’s a lot of urgency in the air, and it can get in the way of any company’s clear thinking about a data recovery strategy.
Cloud and virtualization have become valuable tools in the data protection scheme of many companies. We’re glad those technologies are so useful, but data protection is not the original problem they were invented to solve. They’re not like data backup or offsite storage, and they carry downsides – especially in interoperability – that you might not expect when you have to recover your apps and data after an interruption.
To get back to the hype, vendors are saying that in the era of cloud backup and virtualization, HA is the key to resiliency. We’ve found problems with that notion because in our experience, components and capacities need to be identical.
The popular model of hyper-converged infrastructure (HCI) is a commodity server running the hypervisor and a software-defined storage stack. That server is talking to a combination of solid state drives (SSDs) and storage arrays, or storage hardware, that's directly connected to the server itself. The objective is to mirror that node to another node in a failover cluster, but if the disc drives are not identical (brand, size, etc.), then the replication may fail.
We’re all for resiliency, but the requirement for identical pieces of hardware among nodes seems pretty inconvenient.
Investing in those new approaches leaves many companies with a mixture of legacy appliances, virtual server/software-defined storage and HCI. They may feel as if they’re hedging a bet, but the result is data protection in silos, with restrictions on how and where they can restore data after an interruption, such as an outage, a malware attack or a natural calamity.
Heterogeneity sounds good, but too much of it can also hinder recovery. Companies want HA because it’s supposed to shorten the time between interruption and full recovery. But performing data recovery from heterogeneous sources can go slowly.
To get back to the barn-and-stable paradigm, the difference between a crisis and a disaster is the amount of time it takes to do the recovery. That's why there’s a need for speed in disaster recovery.
And guess what: Data protection, security and business continuity are now intertwined. The advent of cloud and virtualization have widened the attack surface, and your users are clicking on ransomware-laced attachments. You can’t run an anti-malware utility to get rid of WannaCrypt; you have to restore from an air-gapped backup or a protected snapshot volume to a point in time before the attack occurred.
Your disaster recovery solutions are now part of how you keep your business going.
So, when you suffer an attack, how do you decide what to recover first? Recovering trip reports from five years ago isn’t as important as recovering the ERP app or companywide email. But somebody has to analyze your business processes and data to determine what is business-critical and in need of rigorous protection and early recovery.
That work includes specifying all the components required for a successful recovery, like all the metadata, hypervisor configurations, application interrelationships and support software that may underpin a virtual machine. Or, in the case of Microsoft Exchange, it would mean specifying the configuration settings, the roles stored in Active Directory, the mailbox database and the ESE or CRCL log files for point-in-time restore.
Somebody has to test all of this, and with cloud and virtualization, testing may just become a bit simpler. Software-based geo-clustering, virtualization-level replication and tape read/verify mean that testing data restoration during formal test events may soon become a thing of the past. With targets in virtual servers and the cloud, the need to formally test application re-hosting may go away.
Testing may become a training exercise in roles to be played during recovery, or a mic check of communication channels. That would reduce the testing burden on the company as a whole and help business continuity planning.
Where is your company in the data protection landscape? Are you using technologies like virtualization and cloud for IT disaster recovery? Are you backing up to tape and storage appliances? Wherever you may be, listen to our on-demand webcast, Need for Speed: Data Recovery in a Non-Stop World. It covers the concepts I’ve outlined above and includes a walk-through of data protection software from Quest®, such as Rapid Recovery, V Foglight and VROOM, a new product for virtual environments.
Play the On-Demand Webcast >>