Five tips to create the best SharePoint backup and recovery plan

"I am doing nightly full SQL backup, and a weekly SharePoint farm backup. Is this enough?" - "Would out of the box SQL and SharePoint tools be good enough for my backup, or should I look for a 3rd party product?" - "What is the best SharePoint backup strategy?"

 

I almost inevitably hear one of these questions whenever I speak about SharePoint backup and recovery at SharePoint events or meet customers. While it is natural to look for guidance, it is important to understand there is no such thing as the "best SharePoint backup strategy" that would work for everyone. Once you understand that, you're cool to start exploring the strategy that works best for the unique combination of your company's business processes and technical environment.

 

Here're my five tips to consider when you are planning backup and recovery for your SharePoint environment:

 

Tip #1 - Involve business stakeholders and IT budget owners early in the backup and recovery planning.

Backup and recovery plan should not be created just for the sake of it. IT professionals must consider how SharePoint is used, which business processes are dependent on it, and what is the impact of downtime and/or data loss. Take a step back and look at SharePoint in the context of overall company's business continuity. Involving both budget owners and content/process owners will help to set proper expectations among all stakeholders.

 

Tip #2 - Define different restore time and restore point objectives for different services and content.

Restore time objective (RTO) defines how quickly the business needs content and services back after the failure. Restore point objective (RPO) defines how much data it is acceptable to lose without significant productivity loss. These two metrics are the core of any backup and recovery requirements. These metrics should be the outcome of your work with the business stakeholders.


Be aware of a common trap with these requirements: it may seem like a good idea to define common RTO and RPO for "SharePoint" in general. However, not all content and services have the same value for your organization. Work with the business to break the objectives down by applications, business processes, typical use cases, etc. For example, a SharePoint-based web application running an online store is probably business critical for a company, whereas a team project site that may only impact a handful of users if it goes down. Assume these are both hosted within the same SharePoint farm. If you define just one target metric for the entire farm, the cost and complexity of a solution may significantly grow.

 

Tip #3 - Consider which SharePoint failure scenarios you are protecting from, and work on recovery procedures for each of these scenarios.

A backup and recovery plan cannot be a single straightforward step-by-step process. SharePoint farm is a complex living organism with various components and inter-dependencies. While loss of one database may only impact few non-critical sites, corruption of another database can make entire farm unavailable. Are there any natural disaster risks specific to your location? Do you have strict size quotas and retention policies that may provoke "hoarders anonymous" to regularly wipe away SharePoint content - both stalled and critical files alike? Make sure to address all common/expected scenarios as separate procedures in your recovery plan document.

 

Tip #4 - Identify dependencies and create a checklist to avoid "false starts" and wasted time and effort.

SharePoint does not exist in a vacuum. Service and content availability depends on various other systems: underlying network and server infrastructure, SQL Server and IIS technologies, authentication systems such as Active Directory, etc. Obviously, it would be a waste of time and effort to rush into "recovering" anything in SharePoint when its availability is impacted by another system's downtime. A good recovery plan must include a checklist of dependencies, clear and simple ways to verify each of them, together with contact infomation for the responsible groups within IT.

 

It is also good to have a clear definition of which events trigger the execution of which recovery scenarios. If there is a monitoring system in place, who gets notified and which specific events/thresholds are being monitored. If recovery is triggered by a helpdesk call, what is the escalation process. And so on.

 

Tip #5 - Prepare the communication plan.

If you ever had to go through a disaster recovery, you will probably agree it is a stressful event for an administrator. Having your boss call you every couple minutes asking for the status update does not help either. Preparing and documenting communication strategy may help to somewhat reduce the pressure from management and the business. Again, it is important that the communication plan covers the same detailed scenarios.

 

For example, your recovery scenario might require to bring up a critical site collection in a temp environment ASAP before the rest of the farm can be restored. This is a fairly common scenario, since RTO may be different for different content within the same SharePoint environment. Who should be notified about the farm and the critical site restore progresses? How the status update is communicated? Addressing these questions in a communication plan can help avoid the chaotic storm of calls and emails when it comes to the real recovery.

 

Free Extra Tip #6! Don't assume anything, document every single step required however insignificant it may seem.

The ideal SharePoint recovery plan can be executed by a college student currently on her internship with your IT department without disturbing your summer vacation. Don't assume anything, there is nothing obvious or self-explaining. Create the plan, test and document it - then give it to someone else to test, asking them to literally follow each word in the document. Make notes, update the document and test again. And don't forget to repeat the excersise regularly to always keep the plan up to date.

 

And One More Bonus Tip #7! Look for ways to reduce RTO and RPO for specific scenarios leveraging existing backup systems.

In an ideal world, both RTO and RPO would be close to 0 - everything should be available instantly with no data loss. In the real world though the cost of implementation and maintenance of a system that would allow that can outweigh all the business benefits. So you'll have to choose backup methods and strategies that better fit the budgets available. That said, in some cases you can dramatically improve the metrics with lower efforts and costs. We specifically designed Quest Recovery Manager for SharePoint to help you achieve just that!

  • Reduce RTO for granular content recovery scenarios - without the need to change backup strategy. Recovery Manager can work on top of various SQL backup solutions, from Microsoft, Quest, and leading 3rd party backup providers.
  • Provide for emergency access to sites and documents before the entire farm can be brought back online. Recovery Manager has minimum dependency on the infrastructure. Even when the original SharePoint farm is down, you can retrieve documents or restore sites and site collections to a temp location. In fact, as long as you have a backup file, you can retrieve data with Recovery Manager even when there is no SharePoint around at all!
  • Minimize the time and efforts required for cross-team communication in large IT environments. Recovery Manager allows to automate restore process even if there are separate SharePoint admin and DBA roles defined your company. Once the initial configuration is complete, content can be restored by a SharePoint admin without the need to involve backup operators or SQL DBA.
  • Reduce RTO for the full farm recovery from native SharePoint backups. Recovery Manager can walk you through the restore process, verify all dependencies and automate execution across multiple servers to mitigate the risk of failed restore attempts. You surely don't want to start a farm restore only to see it fail (sometimes, hours later) due to a missing account in AD or login in SQL.

 

 

The main takeaway I hope you get from this post is that SharePoint backup and recovery strategy cannot be created by SharePoint administrator alone. Work with the business stakeholders other IT members. Ask a lot of questions, document your findings and confirm any assumption you might have. Finally, test everything you have documented, update the procedures accordingly, and then test again. Repeat that regularly and be safe.

 

Ilia Sotnikov

Product Manager @QuestSharePoint

 

PS If you find this useful, you may be interested in my very irregular SharePoint backup and recovery blog.

Anonymous