[MUSIC PLAYING] Hey, Danny. How are you doing? Listen, I've got a question for you. I just wanted to get your opinion around data. I know you're big into data quality but on the business side. But I want to ask you about data quality during database migrations. Should you or shouldn't you? I know the answer. The answer is yes, you should clean the data prior to the migration. Have you thought this?
I would say, yes. But let's hope, again, I think that that's a one-time task that's going to bring you limited value. Obviously, it's going to bring the data over in a pristine fashion.
But in the real world, in today's world, again, with business trust in data being a huge, huge driver for everything we do down in the IT world in plumbing, if you're at the point where you're having to do a heavy cleanse for the migration, then you really might want to think about looking at data quality a little bit differently and take advantage of this migration to maybe put a little bit more rigor in that because, again, you want ongoing observability.
If you have to clean that much data moving from one database, then you'd better go back and look at all the decisions that you made off the data before that the database migration. But, yeah. You absolutely want clean data in. But then, as soon as you start writing new records, do you still have clean data?
My worry here is that, look, we've initiated this database migration. We didn't notice this. We didn't know the state of the quality we have. And we're going to have to open up a whole new program here around data quality.
Pay me now or pay me later, baby.
OK, you're agreeing with me. But is there any quick way out of this? Is this a case of, look, do your quick win on data quality cleaning now. Get your data across, and then start your data quality program. Or is there--
Well, yeah, You're probably, if you're at that point, if this is where you are at then you're absolutely going to have to do that. So you're going to have to go through the painful process of profiling that data and doing your best to understand what sort of anomalies are there, what kind of issues. It'll probably be very, very basic because your understanding of that data will be somewhat limited.
One of the benefits of some of the new ways of doing data quality moving forward is leveraging a lot more intelligence-- machine learning-- and allowing yourself to train your organization on what data quality is. And then build those and automate those rules so that you're consistently looking at the data.
I'm sorry. I'm smiling here because I asked you before about being lazy, about not having a data model. And I'm guessing you might be thinking it's like, I think this is the result of not having your data model.
Well, it absolutely-- Well, it's not just your data model. But it's your full understanding of your data and how that data relates to the business because that's where the rubber hits the road.
So first off, I'm not hiring you as my CDO because you are way too willing to take shortcuts and just deal with the pain--
It's not me. It's not me.
Listen, as an old data modeler, the whole purpose of this has always been to avoid the pain, not deal with the pain when it comes. It's just the way it is.
That's true. So do you have any upstream? I know you. You love going upstream. Do you have any upstream thing I could do that's a-- keep paddling. Is there anything I could do-- I suppose-- quick and easy to help start that data quality program, but also help my database migration.
Yeah, you would want to put in some quality observability into your data pipelines. And generally that will occur in and around a hub of data intelligence, where you've built a catalog of all your data assets, understanding where it is, what it should be, and then taking a look at that and keeping that constant monitoring. It's going to give you so much value when you can see, is the quality going up, relative quality going up in this data source? Is it going down? Why?
And we can start to recognize anomalies that are more than, this says it's supposed to be 27 characters. And somehow they've put 35 characters in there. That's like real basic stuff.
But once you start understanding what the data is being used for end-to-end, really understanding data, the full data life cycle, your lens on data quality becomes much, much more powerful. And it puts much more power in your hands to be able to deliver good data to the business.
So there's no time like the present. A migration is a great thing because you're spending some money. You're moving to some new technologies. There's no time like the present to start saying, hey, we've also, through this process, identified some poor habits that we've got ourselves into in the old world. Let's get ourselves into some new habits.
So let's say I put a pause on my-- I don't need to pause my migration program because there's things we can do in parallel.
Absolutely.
How long would it take me to stand up that data quality program enough to the point where we can progress with the migration and be smart? How long would it take? Are you talking, like, three, four months? Or are we talking about 18 months? Because if is 18 months, then we're going to have to run it after the migration.
Well, I would run them in parallel. So the migration is, you need as clean data as you can accept. And you have dates and requirements. So four months