Mismatched Tables

Overview

ETL processes are a very powerful in business workflows, that when applied properly, can:

  • Drastically cut out repetitive manual work
  • Reduce the “time to insight” by as much as 90%!
  • Potentially save a lot of money!

Whether you are a Senior Executive, General Manager, Individual Contributor, Entrepreneur, Venture Capitalist, Scientist, Engineer, Software Engineer, Medical Doctor, or Banker, you are bound to find some benefit.

This particular article goes through, what are called ‘Mismatched Tables’ and how you can use ETL to deal with them.

“Mismatched Tables”. What are they? How to overcome them?

If you deal with data, you are very likely to going to have to face up to several challenges. For example:

  • Data that is not very well formatted data (if at all!)
  • Data that has been sent to you in different formats (e.g. after the original data owners decide to change the format over time)

Badly structured and misrepresented data can really hamper the entire progress of a business, and as the volumes of data grows, the problem can very quickly intensify.

Now, all this can turn into bit of a nightmare if you’re, let’s say, running Chief Investment Officer, that has to combine separate silos of data from across your organisation, into a single source of wisdom, quickly, dynamically, and accurately!

This is what mismatched tables are all about: data formatted in inconsistent ways across your organisation.

Let us say that as chief of business and product development, you have access to different product workbooks. The problem however is that the product line managers have become creative and changed many of the column names in their product tables. Like this:

The following steps describe how to overcome this problem.

Steps:

1. Create a Conversion Table

A conversion table is just a simple mapping table that records pairs mismatched columns headers to their desired replacements.

You do not need to list all the column headers in your tables – just the mismatched ones. This makes this ETL method a particularly handy one!

2. Import the Conversion Table into the PQ Editor

After creating the conversion table, import this table into a new worksheet as a Connection Query only.

  • In Excel, on the Data Tab, select ‘Get Data’, then ‘From Table/Range’
  • Call the Query: ‘Conversion_Table’
  • After this, select Close & Load To and select Connection Only, as shown in the picture below:

3. Import product tables from folder

After connecting the Conversion Table, in the same file, import the mismatched product dataset files.

  • In Excel, On the Data Tab, select ‘Get Data’, then ‘From File’, then ‘From Folder’
  • Then, in the Folder dialog window that appears, select the required folder (where the mismatched tables are stored), then click OK.

Then, in the Combine drop-down menu, select Combine & Transform Data

  • In the Combine Files dialog box, that appears after, select Sheet 1, then OK, as shown below:
  • Clicking OK, will then load the Query Editor.

4. Transform the Conversion Table

The next step is to Transpose the Conversion_Table.

  • To do this, in the Queries Pane, select the Conversion_Table query, created earlier.
  • Then, on the Transform tab, select Transpose.
  • Then, select Use First Row As Headers, on the Home tab
  • Delete any “Changed Type” steps, if you see them.

The data should look like this:

5. Get correct column names in the Sample query

After Transposing the Conversion_ Table, add a new line to the Sample Query called “Transform Sample from…”.

  • To add this new line, click on the ‘fx’ button, and enter the following code:
M-Code
   = Table.TransformColumnNames(#"Promoted Headers", each try Table.Column(Conversion_Table, _){0} otherwise _)

When the code runs, it will try to find the corresponding correct column name from the Transposed Conversion Table.

  • Next, click Close and Load, to load the results of the final query into the Excel Workbook.

The Mismatched tables should automatically append to one another.

Final Comments

The above technique will work for any number of files, regardless of how mismatched the column headers may be.

With the right tools and techniques, the whole procedure should really take no more than 5 minutes.

The only thing you need to carefully check, is that the Conversion Table is kept up to date. This can easily be done, by quickly auditing the results after they load into the Worksheet, in the final step.