Shouldn't we be thinking about Active Directory recovery?

Microsoft Active Directory (AD) is probably the most-utilised and under-considered platform in your organisation. Out on a limb here (forests … AD forests, get it?), but let’s call AD your most-critical platform. After all, AD underpins almost every authentication action in your company.

Fortunately, Microsoft built a stable and resilient platform, so AD being as reliable as it is is a huge benefit. And today, most IT teams choose to use third-party applications that utilise AD as the back-end authentication provider simply because it is the source of truth already — after all, every single user in your business logs into AD every day. This increases the reliance on the platform yet again.

Add to this the fact that your new SaaS provider of choice is also either federating with your AD or synchronising credentials with AD, and an Active Directory failure would truly be catastrophic.

Active Directory Can and Does Fail

Consider these recent AD failure examples that I’ve dealt with in the last few years here in Australia:

  • Active Directory DNS failed – customer was unable to conduct business for 12 hours
  • Active Directory Organisational Unit deleted –  customer had recovery tools, so the recovery was under one hour for 2,000+ objects with no downtime for non-affected users
  • Directory replication error between on-premises and cloud platform – 6,000 users deleted
  • Microsoft Exchange upgrade required schema changes – could not complete, rollback failed

When Active Directory ‘breaks’, the fallout is enormous, and, most likely, it’s not currently in scope for major application failure except as a line item stating something along the lines of, ‘Active Directory will be restored from the latest successful backup’. Which is not going to leave AD in a good state.

Active Directory recovery will be a restore from tape, so you’d have to follow the Microsoft recovery guide.

The steps contained in this document are not a recovery plan, but rather a set of steps that you will need to take to get AD back up and running. And they’re certainly not specific to your organisation’s requirements. The expertise to even be able to run this level of recovery may not be available at the time your organisation needs to restore AD.

Editor note: One very large customer of ours here in Australia had three AD experts on payroll (to take account of on-call duties, leave requirements and illnesses at significant expense to them) in case of an AD failure before partnering with Quest

The Problem

Whether your organisation needs to be able to recover quickly is down to the business requirements of the connected IT systems, but, in many cases, the business doesn’t understand the implications of a full-forest outage and just how much business may be affected.

AD recovery is not a simple backup and restore system state procedure, especially when there are multiple DCs. And it’s even worse when there are constraints on expertise, management overheads and/or WAN bandwidth limitations.

From my experience, as a problem grows in scale, more people are affected and, thus, need to be involved in the process of recovering the failed system — which slows down the system even more due to more people getting involved. And without a good rollback position, people are more reluctant to attempt the recovery without more time and additional people becoming involved. It becomes a nightmare of epic proportions.

So, What Can Be Done?

Recently, a customer asked us to provide a comparison showing how our own Recovery Manager for Active Directory Forest Edition (RMAD FE) stacks up against Microsoft Professional Services when it comes to AD recovery. The goal was to highlight the difference in TTTR (total time to recover) between manpower and software.

Microsoft PSO and their recovery process required 17 hours to restore AD.

The Quest solution required 1 hour and 5 minutes. In addition to completing the actual recovery, we also created the recovery process and automated it. In the case of business continuity planning, RMAD FE also allows the business to test full AD recovery without risking the original environment or data.

Pretty impressive stuff.