A Formula for Data Toxicity

I recently attended the TITUS Foundations 13 conference in Ottawa, Canada. Three days of discussion, presentations and speakers all focused on data classification. One of the presentations I attended was given by John Kindervag, Principal Analyst at Forrester Research. John was not only an entertaining speaker but he was also very thought provoking. I was especially interested in his formula for toxic data which he gave as 3P+IP=Toxicity. His explanation of the formula was as follows:


The 3Ps are represented by:


  1. PII - Personally Identifiable Information
  2. PHI - Protected Health Information
  3. PCI - Payment Card Information


And IP is, of course, Intellectual Property. Kindervag's hypothesis is you can determine the level of data toxicity by examining how the data is impacted by compliance to state or federal regulations or how important the data is to your company based on its intellectual property value. Accordingly: T=3P+IP. I think there's an additional variable that can be added to this equation - time (t). If you think of it, something that might "highly toxic" today may be non-toxic tomorrow. A good example would be the earnings of a publicly traded company. Clearly, such data would be considered highly toxic before it was released to the general public but after the release it wouldn't be considered toxic anymore.


Kindervag further asserted that you can use the formula to classify documents as "unclassified", "toxic" or "radioactive". While he didn't define toxic or radioactive specifically it's pretty easy to assume that toxic data could be anything that falls into one or more of the 3Ps and radioactive data could be IP-related. He went on to discuss classification states in more detail but the key take-away from that discussion was to have as few data classifications as possible (3 or 4) to make it as simple as possible for end-users to self-classify their data.


The formula provides a framework that enables you to weigh your data exposure based on these compliance risks combined with considerations related to any intellectual property value.