Over the past few months we’ve all seen how “fake news” and “alternative facts” can influence a presidential campaign, but it’s not just world leaders who can be affected. “Fake data” has the potential to cause chaos within organisations.
People talk about being in the Information Age, but for organisations it’s more like the Data Age, epitomised by trends such as Big Data, non-relational databases, unstructured data and IoT. Sometimes it feels like organisations are just collecting and storing as much data as they can get their hands on without really knowing what they’re going to do with it or if it even has any value to them.
The four pillars of Big Data are Volume, Variety, Velocity and Veracity. Also known as the four V’s of Big Data. In this blog I want to focus on the last one – Veracity, which my thesaurus tells me means Truth and Accuracy. Logically, one would think, in order to ensure data veracity you need to be able to verify the data somehow. That’s not too difficult with traditional types of data such as phone numbers, postcodes, date of birth, etc. but when bots and algorithms are gathering unverifiable data from social media and the web it seems that perhaps all V’s are equal but some are more equal than others. Of course data veracity is not unique to Big Data. Anywhere you store data you need, and expect, it to be truthful, whether it be an Oracle database, an application like Salesforce.com or even an Excel file.
As I write this blog I notice I get an email from our national airline – Aer Lingus. Then I notice another, and another. But it is the exact same email! It dawns on me that in the past 12 months I have booked flights with Aer Lingus using 4 different email addresses, 3 different credit cards and 2 different passports. I wonder who Aer Lingus think Denis O’Sullivan is. Have they figured out I’m one and the same or do they think I’m more than one person? Well, let’s be honest, the mystery of who I am is not going to trouble a major airline too much but it’s interesting nonetheless to see if this large organisation has profiled and cleansed this basic 2-dimensional data.
In my own day-to-day work I regularly see companies putting the cart before the horse when it comes to analysing and interpreting their data. More often than not when I speak to end-users they start by saying they want dashboards! Basically they want to visualise the data and make it look pretty. Then they work backwards and check if the pretty data is actually correct. Taking the time to profile your data so you understand it is a vital step that is often skipped or not prioritised. Once you understand your data you are then in a position to cleanse it. And once cleansed you may then decide to visualise it.
So how do companies figure out if their data is incorrect or even fake! Those clever folk in Aer Lingus have, of course, very easily decoded the enigma wrapped in an anomaly I inadvertently created. They simply offered me points! Now, no matter what email address or credit card I use, if I want those delicious avios points I’m gonna put in my membership number and unravel the puzzle for them! But not all data problems are going to be so easily solved. This is where the proper tools become crucial.
Like I mentioned before BI tools can help you visualise the data but for genuine data analysis you need a genuine data analytics tool. For me, the criteria is quite straightforward:
We often presume the data we consume is accurate just because it’s in our database, but fake and inaccurate data is easily gathered but it can also be easily identified and eradicated using the correct tools. Not doing so can lead to inaccurate and harmful decision-making in your organisation and the more decisions made based on this fake data, the more fake data you’re going to generate. Then the task of identifying what is real and what is fake becomes even more difficult.
If you feel like what I’m saying makes sense then check out Toad Data Point, Quest’s data analytics offering designed specifically for data analysts that offers exceptional connectivity, data profiling and data cleansing features.
In the meantime I see Aer Lingus have a sale on. I’m thinking a weekend in Barcelona!
Learn more about the power of Toad download our FREE Trial now.
Good post about the Veracity,
Hi Denis, I have a question, What if we perform ETL on all Data sources and maintain a Data Warehouse and automate ETL. Like now we have all the data in our Data Warehouse and we do not need to query Data sources anymore?