HR - On-boarding
Overview
Suppose you work in the HR department of a company. One of the tasks given to you is to make sure that the “right forms have been received from the right people at the right time”, as part of the HR on-boarding process.
Because the forms take time to fill in, they are not all submitted, by each prospective candidate, at the same time. So, for a particular individual, you might receive 2 or 3 forms from an employee by email, and then another 2 or 3 forms, a few days later, by post.
As a result, you decide that as you get the forms in, to keep track of what has been received, (in a Table), like the one shown below:
The red colored cells show duplicate employee numbers, as some juniors have accidentally entered the same number into the same column, more than once.
After collecting as much data as you can, you need to find a quick and efficient way of finding out which candidates you are still waiting for forms from, so your team can follow up.
This article explains how to do this, with ETLs.
Steps
1. Load the data as a query
- Load the data as a query, so it looks something like this, in the Query Editor:
- Remove the “Date” column as it is not needed.
- Unpivot the data, so you have two columns, as shown below:
- Call the query ‘DocsSubmitted’
2. Create separate unique lists of Employees, and Document types
- Next, create a unique list of employees that have submitted some data, from the earlier query created
- Use the following ‘M’ code:
let
Source = DocsSubmitted,
#"Removed Columns" = Table.RemoveColumns(Source,{"Attribute"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Value", "EmpNo"}}),
#"Removed Duplicates" = Table.Distinct(#"Renamed Columns")
in
#"Removed Duplicates"
The data should look something like this:
-
Call the query UniqEmpNo
-
Make a similar query that generates a unique list of documents submitted, using the following ‘M’ code:
let
Source = DocsSubmitted,
#"Removed Columns" = Table.RemoveColumns(Source,{"Value"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
#"Renamed Columns" = Table.RenameColumns(#"Removed Duplicates",{{"Attribute", "DocsList"}})
in
#"Renamed Columns"
The results of this query should look something like this:
- Call the query UniqDocsList
3. Create a cartesian product between employees and documents
- Next, create a Cartesian product from the unique employee and document list queries just created, using the following code:
let
Source = UniqDocsList,
#"Added Custom" = Table.AddColumn(Source, "Custom", each UniqEmpNo),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"EmpNo"}, {"EmpNo"})
in
#"Expanded Custom"
The data should look something like this:
- Call this query ‘MultipleCombo’
4. Create a query to show only unsubmitted documents (multi row)
- In a new query, perform a left anti-join merge between the queries ‘MultiCombo’ and ‘DocsSubmitted’, as shown below:
This will show all the documents that are in the first query, but not in the second (i.e. all those documents that have not yet been submitted)
- Expand the necessary fields, and re-order the data, so that it looks like this:
The final ‘M’ code should look something like this:
let
Source = Table.NestedJoin(MultipleCombo, {"EmpNo", "DocsList"}, DocsSubmitted, {"Value", "Attribute"}, "DocsSubmitted", JoinKind.LeftAnti),
#"Expanded DocsSubmitted" = Table.ExpandTableColumn(Source, "DocsSubmitted", {"Attribute", "Value"}, {"Attribute", "Value"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded DocsSubmitted",{"Attribute", "Value"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"EmpNo", "DocsList"}),
#"Sorted Rows" = Table.Sort(#"Reordered Columns",{{"EmpNo", Order.Descending}})
in
#"Sorted Rows"
- Call the query DocsNotSubmitted
5. Create a query to show only unsubmitted documents (single row)
- Create a reference to the earlier query, and use the apply the following ‘M’ code:
let
Source = DocsNotSubmitted,
#"Grouped Rows" = Table.Group(Source, {"EmpNo"}, {{"Docs In One Line", each Text.Combine([DocsList], "; "), type text}})
in
#"Grouped Rows"
This will group the data by employee, and show the unsubmitted documents together in one cell, separated by a semi colon. The data should look something like this:
6. Create a query to check for employee duplicates
Warning: Unfortunately, the conditional formatting options, applied earlier to the original source table, to identify duplicate candidate numbers entered in the same column, is still not very reliable in Excel. It often breaks down, particularly when cell values are copied and pasted.
Because of this, we will instead create a query to identify duplicate submissions.
- To do this, create a new query, referencing to the ‘DocsSubmitted’ query. Apply the following ‘M’ code:
let
Source = DocsSubmitted,
#"Grouped Rows" = Table.Group(Source, {"Value", "Attribute"}, {{"Count", each Table.RowCount(_), type number}}),
#"Sorted Rows1" = Table.Sort(#"Grouped Rows",{{"Value", Order.Descending}}),
#"Filtered Rows" = Table.SelectRows(#"Sorted Rows1", each ([Count] <> 1)),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"Value", "EmpNo"}, {"Attribute", "Duplicated in Which Column?"}, {"Count", "How Many Times?"}})
in
#"Renamed Columns"
The query will group the data, by the Value and Attribute columns, and filter it to show only those rows showing duplicate data.
The final query output, when loaded to Excel, should then look something like this:
Voila, you now have a more reliable table of data to work with!
Feedback
Submit and view feedback