Forrest Gump would probably tell us we can’t have our peas without carrots. I’m here to tell you that you can’t have backup and disaster recovery planning without including a discussion on MTPOD (Maximum tolerable period of disruption). Wikipedia defines MTPOD as “the maximum time that key products or services can be unavailable or undeliverable before stakeholders perceive unacceptable consequences”.
We’ve been blogging about disaster recovery planning lately and the steps firms should take to build an IT disaster recovery plan. I wanted to highlight MTPOD today since it’s such an important part of that process. The picture below should help aid the discussion.
MTPOD vs. BIA
For you mavens out there, I did some research and found that the concept of MTPOD didn’t even exist until late 2007, with the release of British Standard 25999-2. Thrilling, right? Yeah, maybe not. But here’s what we need to know: MTPOD is the logical outcome to any business impact analysis. According to ready.gov, “A business impact analysis (BIA) predicts the consequences of disruption of a business function and process and gathers information needed to develop recovery strategies.”
A critical aspect of identifying the consequences of potential disruptions is also identifying how long that disruption can last before significant damage is realized by the business. Inside the evolving landscape of IT, that’s often easier said than done. The business is demanding to be constantly connected and protected – all at the fastest speeds possible.
As we’re beginning the process of disaster recovery planning, we must first understand why there seems to be a gap between business and IT. Think of it as protecting parts versus providing services. There are often unspoken expectations on both sides of the fence.
Four causes of misalignment between business and IT:
1. Lack of business-defined recovery SLAs.
Tiered recovery is essential and can be defined with a business impact analysis.
2. Different understandings of SLA commitments.
The business expects full services to be up-and-running ASAP while IT often is measured on just getting that data back.
3. Lack of tailored recovery strategies.
We recommend developing a services catalog to categorize applications based on their recovery needs.
4. Lack of testing of IT continuity plans.
Take advantage of multiple test types to ensure you meet business needs. This can include walk-through table top tests, component tests, or full DR exercises where you fail over to another site.
In order to drive alignment between business and IT, we must spend a decent amount of time talking about what our services catalog looks like. For example, many firms use Silver, Gold, and Platinum to categorize the criticality of any given system inside their environment. Is a service’s MTPOD 0-2 hours? That would likely fall into the Platinum tier. 10-24 hours? That could be a good fit for the Silver tier.
Ideally, each service tier would have a description. For example, the Platinum tier would include services that are vital to the business, and if anything happened to them, there would be major repercussions. Therefore, the technology requirements of that tier could demand automated application failover to a hot standby location, or synchronous storage-based replication. That technology is what allows you to fall well below of your MTPOD (hopefully).
An important thing to remember: as an IT pro driving a disaster recovery planning project, you must understand how long it will take to perform the process needed to restore services to key stakeholders. If your current technology is unable to match the desired RTO, you may need to consider looking into faster alternatives. Similarly, some realize they are overprotecting their environment and paying for services they don’t really need. Do you really need the ability to restore all your Silver tiers in 0-2 hours? Perhaps that money could be better spent elsewhere.
So to summarize, peas and carrots are to Forrest and Jenny as MTPOD is to disaster recovery planning. Unless we have a solid grasp on the maximum allowable downtime the business can sustain, we will struggle as we’re trying to find alignment between the business and IT.