Working out the potential return on investment on a new server or application is relatively straightforward – if it enables you to do more in less time, profits are likely to go up, making it worth buying. If there are no obvious benefits, however, and no immediate gains, then the computation is far more challenging. This has always been the case in the realm of backup and recovery, where the chief benefit is the ability to maintain operations (measured in uptime), rather than enhance the bottom line. The trouble is, although essential to business continuity, backup and recovery systems are often neglected because they do not directly generate revenue or reduce costs. However, we can calculate what downtime means to your organization:
The total cost of eight hours of downtime in this particular example is $62,568. And if the downtime affects a customer-facing website or application, these numbers don’t even begin to calculate the costs of customer frustration, a flood of calls to your customer support teams, or giving your customers an opportunity to consider alternatives. The outcome of your calculation will be different, but the key principles of how to calculate the value of DR are the same.
Causes of Downtime
While defining downtime in general is relatively simple – the time during which one or more resources (in this case related to your IT environment) are unavailable – the root cause of downtime can take many forms. Some scenarios, such as natural disasters, power outages, equipment changes or maintenance, can be planned appropriately, while others cannot. This lends an element of uncertainty to planning, yet identifying the causes of downtime is critical to establishing an intelligent and detail oriented plan to reduce downtime. The majority of organizations have established plans or procedures to recover from things like natural disasters, power outages and even malware and malicious attacks.
Yet many sources report the number-one cause of downtime is human error. (The Uptime Institute, for example, reports that human error is the cause of more than 70 percent of data center downtime). The take away here is: “Sweat the small stuff.” You are probably prepared for a natural disaster, but are you prepared for the contractor that will inadvertently rub against a meekly-protected “kill switch,” shutting down the data center? Or are you prepared for animals chewing through cords? Or police shutting down the block and denying access to your racks? These are all scenarios we have seen, and they are just the tip of the iceberg of the potential list of possibilities that can leave you stranded.
To balance an ROI equation – even the hypothetical one posed here – we also need a solution side of the equation. There are many different kinds of technological solutions to consider for reducing downtime and maximizing your data protection, backup and recovery environment – server backup software, backup appliances, physical machines, virtual machines, the cloud. Nearly every organization has different needs, and needs different capabilities. One size rarely, if ever, fits all in this world, but there are things to consider when looking at solutions:
- Ease of use: Self-explanatory here, but if your staff spends less time installing, learning and maintaining a given system, the more time they will have to perform more important functions. Time is money, and saving time saves money.
- Automation: As technologies mature, more and more functions are being automated. Again, if you can make previously manual processes automated, you are saving your staff time.
- Speed: Recovery speed is critical in reducing downtime. Let’s face it – you will have downtime. How quickly you can restore can mean the difference between a “mild inconvenience” and a “potential disaster.”
- Solution maturity: Sometimes overlooked, one common cause of downtime is software bugs or loss of functionality. Backup and recovery is so critical to business continuity, solutions need to be bullet-proof.
- Specific features: In planning your environment, understand your needs and pay attention to technologies and features that will help maximize resources. Features like data compression and deduplication can reduce stresses on other infrastructure elements and help manage data growth – in turn simplifying your environment and reducing costs.
3 Steps You Can Take Now
So given everything you’ve read up to now, what can you, an IT leader, do today to help reduce downtime and ensure your team is in the best position to succeed? Approach your data protection environment as you would any major system and lay out a clear path for improvements. Objectively and meticulously assess your environment; create a definitive plan with specific and reachable goals; and execute that plan.
- Identify points of failure: As discussed earlier, you can’t plan for every mishap that may cause downtime, but you can plan for the most predictable types of downtime you may experience.
- Review existing operations: Take careful stock of what is working and what isn’t, both on a day-to-day basis and in the long haul. Previous ad-hoc fixes may be getting you through the day, but those types of solutions can frequently make recovery more difficult in the event of an unusual outage or event.
- Determine special circumstances: Get a firm grasp of special circumstances your organization may have. Do you have, or will you have, certain compliance regulations to adhere to? Are you located in an area where power outages may be more commonplace? Are you working with regulated, critical data?
- Assess data criticality: Assessing how critical each set of data is to the organization must be done with a critical eye. Rank your applications and types of data by how long you could possibly function without them. Indeed, all data is important, but some is mission critical to your hour-by-hour operations (read: revenue generation).
- Set goals: Make your goals specific and time-related. What is needed immediately? What is needed within the year? Within three years? Within five years?
- Write it down: Too often IT executives learn what they want and need and keep it to themselves. Take the time to write a plan and share it with your team. And don’t neglect it – update it as goals are reached and new goals are added.
- Communicate: Be clear with your team as you roll out your plans.
- Take small bites: Your plan may be quite broad. Take it one step at a time to avoid deployment issues. Reducing downtime is an ongoing endeavor and to achieve the greatest results, you must plan for the long haul.