Hint: It’s not your hardware, applications or disaster recovery tools
As part of your CIO’s business continuity plan, are you executing on a strategy to reduce downtime?
As a director, you may have seen the latest industry surveys that state, on average, downtime could cost $5,600 per minute â”€ which could equate to $300k per hour1. So, now you are on a mission.
More than half of IT directors have a strategy to significantly reduce system downtime in the coming year, and others have a strategy but haven’t yet begun to implement it. Those strategies typically fall into upgrading or changing three areas managed by IT:
Plans to reduce scheduled downtime, typically include snapshot/rollbacks, better patching tools and live patching. To reduce unplanned downtime, you might be looking to leverage redundancy such as a high-availability cluster, snapshot and rollback function or upgrade OS as likely steps.
Although those strategies are beneficial in reducing some causes of downtime and enable opportunities for effective disaster recovery, none of them address the number one issue causing downtime.
Business continuity is not disaster recovery.
Just because you have backed up the data doesn’t mean you’ll be able to continue doing business during a failure, unless you address the number one cause for downtime incidents and extended downtime issues that occur when trying to restore business operations.
“Your strategy is missing the #1 offender causing 70% of downtime.”
The Uptime Institute says: “More than 70% of data center downtime is caused by human error”. The top 4 examples of IT issues which lead to human error, both causing and/or extending downtime, are:
1) Lack of training and communication
Good business continuity planning should include clearly stated roles and responsibilities. Employees need to know who, what, where, when and how to address downtime events. Business continuity plans and disaster recovery strategies don’t mean a thing if your staff can’t implement them. Do you have a communication strategy? More importantly are you using it?
2) Undocumented updates to business continuity plans and procedures
Do you have non-existent checklists, or out-of-date procedure documents? A recent report revealed even though 80% of organizations claim they document changes, 70% of IT pros frequently make changes without letting anyone know2.
3) Poor operational governance
IT departments need to run testing scenarios, not just to test the system but to highlight where staff might be deviating from procedures or need additional training. Also, they need to inspect that proper labels and color coding on equipment and systems are in place to minimize user error when dealing with redundant power sources or hardware updates.
4) Investigation protocols
When outages occur it’s imperative that a business has protocols in placed to determine route causes to identify short and long-term steps to avoid downtime in the future. I know when the system is down, this is the last thing you have time to deal with, but having a protocol in place with basic documentation tracking steps during an outage can save you time and money in the long-run.
We’d love to know in the comments below, what’s one thing you’re doing this year to reduce downtime?
Next steps: Learn more about reducing downtime with the upside of increasing ROI.
1 Gartner: The Cost of Downtime
2 Netwrix: State_of_IT_Changes_Survey_2015