Forget Fake News, what about Fake Data?

Over the past few months we’ve all seen how “fake news” and “alternative facts” can influence a presidential campaign, but it’s not just world leaders who can be affected. “Fake data” has the potential to cause chaos within organisations.

People talk about being in the Information Age, but for organisations it’s more like the Data Age, epitomised by trends such as Big Data, non-relational databases, unstructured data and IoT. Sometimes it feels like organisations are just collecting and storing as much data as they can get their hands on without really knowing what they’re going to do with it or if it even has any value to them.

 

The four pillars of Big Data are Volume, Variety, Velocity and Veracity. Also known as the four V’s of Big Data. In this blog I want to focus on the last one – Veracity, which my thesaurus tells me means Truth and Accuracy. Logically, one would think, in order to ensure data veracity you need to be able to verify the data somehow. That’s not too difficult with traditional types of data such as phone numbers, postcodes, date of birth, etc. but when bots and algorithms are gathering unverifiable data from social media and the web it seems that perhaps all V’s are equal but some are more equal than others. Of course data veracity is not unique to Big Data. Anywhere you store data you need, and expect, it to be truthful, whether it be an Oracle database, an application like Salesforce.com or even an Excel file.

As I write this blog I notice I get an email from our national airline – Aer Lingus. Then I notice another, and another. But it is the exact same email! It dawns on me that in the past 12 months I have booked flights with Aer Lingus using 4 different email addresses, 3 different credit cards and 2 different passports. I wonder who Aer Lingus think Denis O’Sullivan is. Have they figured out I’m one and the same or do they think I’m more than one person? Well, let’s be honest, the mystery of who I am is not going to trouble a major airline too much but it’s interesting nonetheless to see if this large organisation has profiled and cleansed this basic 2-dimensional data.

In my own day-to-day work I regularly see companies putting the cart before the horse when it comes to analysing and interpreting their data. More often than not when I speak to end-users they start by saying they want dashboards! Basically they want to visualise the data and make it look pretty. Then they work backwards and check if the pretty data is actually correct. Taking the time to profile your data so you understand it is a vital step that is often skipped or not prioritised. Once you understand your data you are then in a position to cleanse it. And once cleansed you may then decide to visualise it.

So how do companies figure out if their data is incorrect or even fake! Those clever folk in Aer Lingus have, of course, very easily decoded the enigma wrapped in an anomaly I inadvertently created. They simply offered me points! Now, no matter what email address or credit card I use, if I want those delicious avios points I’m gonna put in my membership number and unravel the puzzle for them! But not all data problems are going to be so easily solved. This is where the proper tools become crucial.

Like I mentioned before BI tools can help you visualise the data but for genuine data analysis you need a genuine data analytics tool. For me, the criteria is quite straightforward:

  • Firstly, you need connectivity, and considering the vast array of data sources available today you need a tool that can connect to pretty much any data source, because where you get all your data today may not be where you get all your data tomorrow.
  • Secondly, you need help understanding the data sources. Nobody can be expected to be an expert in every type of database so you need a tool that will standardise your view across all data sources, allowing you to easily understand what it is you’re actually looking at.
  • Thirdly, you need to query those data sources. Again, you can’t be expected to write scripts in every type of database and get accurate results so a common querying interface that also allows you to cross-query between all these data sources is another must.
  • Then, finally, once you have accessed, understood and gathered all the data you’re looking for, you need to see if it makes sense! So your tool of choice needs to be able to profile the data and then allow you to transform and cleanse it so that it is uniformed and presentable.

We often presume the data we consume is accurate just because it’s in our database, but fake and inaccurate data is easily gathered but it can also be easily identified and eradicated using the correct tools. Not doing so can lead to inaccurate and harmful decision-making in your organisation and the more decisions made based on this fake data, the more fake data you’re going to generate. Then the task of identifying what is real and what is fake becomes even more difficult. 

If you feel like what I’m saying makes sense then check out Toad Data Point, Quest’s data analytics offering designed specifically for data analysts that offers exceptional connectivity, data profiling and data cleansing features.

In the meantime I see Aer Lingus have a sale on. I’m thinking a weekend in Barcelona!

Learn more about the power of Toad download our FREE Trial now.

About the Author
Denis.OSullivan
Hi folks, I'm a system consultant with Quest since 2015 based in Cork, Ireland. Prior to joining Quest I worked as a database administrator and application server administrator. My role at Quest...