In this video, we're going to cover how and when to use transformation and cleanse when you're in Toad workbook. OK so I've got example workbook open right now. And I've got a workbook that's about customers and orders. And it has two workflows in it. One is a basic query about customers. And one is about orders. Let's look at this customer query that I've got here.
When I look at this customer query I can see that I've been joining address, contact, and customer data. And brought back a series of results. Now I can start making pivot grids like I've done so far against this result or doing other analysis. But what if I want to embellish, or clean, or filter these results? What's the best way for me to do that?
Well, one of the options I have is to do transformation and cleanse. So I'm going to walk you through doing that now. So transformation and cleanse is a step that you add after you've done the query. So I've gone to my query step here. I'm going to add a child step. It's a work step. And we're to choose transformation and cleanse.
And this will open up my data within a transformation and cleanse window. I'm going to get rid of my workbook window by just unpinning it. So that hides it. So we can take a look at this transformation and cleanse window.
Now keep in mind I'm just looking at the first 1,000 rows of records. Because just revisit, that's what I'm fetching within my workbook. But within this 1,000 rows of records. Put that back. I'm looking at some sample data within my workbook.
OK, so over here on the left-hand side I have all of the columns in my query results. Over here I've got my 1,000 rows of data, of actual data in those columns. And then over here on the right, I've got a little bit of profile information about each column as I visit it.
So let's visit this sex column here. When I click into the sex column I can see I have two values within the sex column. I've got an M-- it's very small on my screen-- and an F.
This represents the number of values that I have for each one of those. Or a number of records I have for each one of those values. I'm just going to minimize this graphic.
I could also look at the tree map here. And again, this is just for the 1,000 records that I've brought up. And it will show me how many I've got that are distinct, how many are populated, how many are repeated, and how many nulls I have. So 27% of the values are nulls.
If I want to look at this information in a little more tabular format, it's right here in this summary next to me. How many rows I have null, how many rows where it's missing but not null, how many rows where it's populated, and how many distinct values I have in this column. So again, we know it's M and F for here.
And then it tells me the actual string type. And then it tells me the top patterns within that data. So that's this here. And then this steps tab here we're going to walk through.
So as I look at this data and I decide to make adjustments to the data. Like for example let's go ahead and in this column here, for sex, let's go ahead make an adjustment by removing our nulls.
So let's replace the null value. So I'm going to come here, find nulls and replace with unknown. And it gives me a little data on here as well-- found in 268 rows, replacing it with unknown. I'm going to hit Apply Rule.
OK. So now I've replaced all of my gender nulls with unknowns. Let's do some more transforming of this data. Let's isolate the home or the business. Let's start with business area code. So let's, first of all, take this column and split it. And we're going to have it-- we'll have it split this column down here at a separator. And the separator we're going to split it at is the ending parentheses. And we'll split it into two columns.
OK. Looks good. Apply that rule. Here's the new two columns that are created. We'll now take this new column here, and let's replace this parentheses that ended up in that value. As you can see, I split it here, so I grabbed that first parentheses. And let's replace that with nothing and apply that rule. Great. So now I've got this area code here.
Now let's do some renaming of these columns. First of all, let's go ahead and rename this column-- this one out here, this Business Phone Split 1-- let's rename this column Area Code. So we have the area code. And let me actually be a little bit more specific. We'll write business area code since it's the business phone number area code.
And then this second split here-- let's decide that I don't need this output in my actual end data results. And so I'm going to pick this column here and say let's remove this column. And so we can see I've just unchecked the ones I don't want. I don't want this one to apply that rule.
What other things can we do inside transformation and cleanse? Well, let's take a look at the birth date. Let's extract the birth year. A lot of times, people don't want their actual birth date. Let's just pick the birth year. And so let's go ahead and extract from this Date column. We're just going to take the year and apply this rule.
And again, I've got a fun column here called Birth Date Year. Let's change the name of this column as well. So let's rename this column to Birth Year. Great. And apply that rule.
All right. So you can see as I'm going through and making these transformation steps, adding them, it's actually showing me what my results will look like once these steps are applied in my data. Now, I want to point out that as I went and created each one of these rules, as we went through and replaced the nulls with unknown and changed the name of the column from business phone-- or split the business phone column into a Area Code column and a regular Rest of Phone Number column, and then we had the Business Phone Column-- we replaced the open parentheses with nothing-- as we went through and created-- each one of these steps were created here. So so far, I'm doing seven steps to this data.
The really nice thing about this is I can go back to any one of these steps, click on it, revisit it here, and change it. So let's say you want to add some spaces and update that rule. And now that rule is updated. And we can see that my field name is Business Area Code.
Or let's say we want to-- we'll go ahead here, and we'll say, you know what? Underscore seemed to be my standard within this dataset, so we can change them. OK. So by using transformation and cleanse, I can test out my transformations before I apply them, and I can adjust them. So so far, here I've adjusted what they do, but I can also adjust where they show up.
I can take this renaming of columns and drop it at the very end of my list. So it's doing something at the very end. And as a matter of fact, I have two renaming steps here now. I have the renaming the Birth Date Year to Birth Year and renaming the columns Business Phone to Area Code. Why not just combine that into one step? And so I will just rename this to Birth Year as part of this step.
Oops. Actually, first delete the original step here. So I've deleted that step, and now let me revisit this step. And this is where I'll rename it to Birth_Year. OK? So you can fuss with these rules till you get them right. And once you got them right, then this step within your workbook, all of the children step that you put off of this cleansing step will show all the new columns you've created and all the new values you've put in.
So when I come into this first pivot, which is directly off the query, I don't see things like birth year or area code because the data hasn't been cleansed. But if I come off the cleansing step and add a child step-- and we'll add a pivot grid-- all of those new fields are available to me.
So there's lots more about transformation and cleanse, all sorts of great things you can do in here. You can group columns. This would be like case statements. You can filter and format the data. You can trim extra spaces off the data, convert the data type. Lots of things you can do here.
This is the same transformation and cleanse that we have inside the traditional data point UI. So if you want to learn more about transformation and cleanse, there's quite a few videos about just transformation and cleanse within Toad Data Point. So you can learn about the intricacies of transformations and cleanse there. But I wanted to show you in this how you can use it within a workbook.