True story: I knew of a bank once whose disaster recovery plan involved finding the physical backup tapes that had been carted to a sister branch two weeks prior and then trying to restore them. Problem was: no one checked the tapes to ensure the backups were working. Once located and installed, they discovered the backups would work, but only about 15% of the time. That debacle resulted in the hiring of a new IT director, who quickly went to work developing a comprehensive DR plan for the bank, which included automation, replication, and aggressive testing.
Just in time, it turns out, because right after the DR plan was up and running, the hotel next to the bank caught fire and the city shut down power to the entire block. For two weeks. Resulting in the first (and successful, I might add) test of the new IT director’s disaster recovery plan.
We’ve been talking a lot about disaster recovery here in our blog Nine Steps to Building a Business-Oriented Disaster Recovery Plan because we’ve heard countless stories of unforeseen disasters, DR not taken seriously, and too many IT teams woefully unprepared for an unplanned event. In truth, disaster recovery planning isn’t just about human error, big storms, and mighty disasters, really. Disaster recovery planning is about the planning part of the equation.
Let’s take a look on the flip side, and see how one organization did it right.
Tasmania Fire Service (TFS) is responsible for fire suppression and control throughout the state of Tasmania, Australia – a geography where bushfires spike during summer months and close to 11 percent of the population live close to bushlands. TFS lives with risk every day. But the IT team knew that its critical data had to be anything but at risk, so it embarked on a well-thought out DR plan that articulated needs, set out key goals, and then put a workable plan in place that was right for their situation.
Here’s what they did:
- Looked clearly at the situation: TFS conducted a risk assessment and defined what the key critical assets they needed to protect in case of a disaster. TSF realized it had a lot of burgeoning data growth that was important for fire assessment and needed critical-level protecting. , “We had a lot of system databases that needed to be backed up regularly, and with photos taken at just about every incident that fire fighters attend, this represents a significant source of data growth within our file system,” said David Watson, Manager Infrastructure-Windows, Information Systems Branch.
- Set key goals: In covering a large geographic location with limited staff, the fire service wanted to automate backup and recovery in order to focus its energies on proactive fire prevention and education.
- Took realistic steps in infrastructure design: The TFS designed their environment around their needs – needs clearly defined in the plan – and then sought and applied the appropriate products and technology to each need rather than letting the functionality of potential products define their path. “We design all of our systems with high-availability and short recovery times in mind,” explains Watson.
Because TSF knew from first-hand experience that education and preparation were keys to preventing and, fighting fires in Tasmania, they took the same approach to their data protection plan - mapping out the education and preparation that was necessary for their IT support group.
And that bank I mentioned earlier? Because they too had put their DR plan into effect, not one penny of productivity or revenue was lost as a result of the local hotel fire. When asked about what he’d say to a colleague about DR planning, the bank IT manager told me, “I’d ask my colleague - ‘what would you do if your data center was without power for two weeks without notice?’ or, ‘What would you do if you couldn’t physically be on premises for two weeks?’”
Good question to ponder. What WOULD you do?