A Simple Typo and a Flaw in Error Handling...a disaster in the making

A seasoned IT staff member at a large energy company accidently set an invalid bitmap (just transposed two numbers), and a flaw in the error handling didn’t catch it and it brought down the entire domain within 15 minutes. JUST A SIMPLE TYPO! As soon as the bitmask change was committed on the DC where it occurred, that DC immediately became nonresponsive. “Within 15 minutes, every one of the 36 domain controllers in the domain was nonresponsive,” recalled the principal IT engineer at the energy company. “The domain was effectively down.”

 

Disaster struck from a simple typo. This organization wasn’t really in the market for a backup and recovery solution when they acquired it but was very happy that they had the tool when they needed it and it saved the day. Read more about the details here.

 

And, by the way, it’s not that people don’t want to talk about these things when they happen. Just like from this seasoned IT engineer, stories are shared to help colleagues so that they don’t make the same mistake but names are usually withheld. Do you have a story to share?

Anonymous