Matching data sets where there is not a shared key can be a real pain. If you are using Excel and a Vlookup, then you will have to do a lot of manual cleaning to get your data to match. For example table 1 may refer to a product as ‘StandUp Paddle Board XR100’, whereas table 2 may list the same product as ‘xr100 – Stand Up Board’.
However, there is an easier way to match similar data sets. You can use a Fuzzy Match to join your data. Fuzzy Match, or more formally called approximate string matching, is the technique of finding strings that match a pattern approximately. The fuzzy matching technique applies a matching percentage. The algorithm returns a similarity between pairs in the range of 0 to 100%, where 0 is no similarity and 100% is an exact match.
To download the excel Fuzzy Match add-on, navigate to the Microsoft website here. Press the ‘download’ button and follow the the instructions to install the excel add-on.
Once the download is complete, close excel and reopen. You should now see a new tool bar labelled ‘Fuzzy Lookup.’
The first table has the book name and author.
The second table has the book name and price. You will notice that the book name in the second table is similar, but not exactly the same as the first table.
The first thing we will do in excel is turn our data into tables, so the fuzzy lookup tool and locate the information. Highlight the table 1 data and then click on INSERT -> TABLE . You will see a popup with your data range you selected and you can tick the check box ‘My table has headers’ to capture the first row as headings.
If you click on the table, you will see a tab pop up called ‘Table Design.’ Here you can view and edit the Table Name in the left hand corner.
Go ahead and repeat the above process for the second table and you should see the below for Book Name and Price.
We are now ready to do a Fuzzy Match. Create a blank worksheet to paste the output data. Click cell A1 in the new worksheet and then navigate to the Fuzzy Lookup tab and select the ‘Fuzzy Lookup’ icon highlighted in red below. Once you click the icon, you will get a toolbar on the right. You then select your 2 tables to join and then choose the two columns to join on. In our example we will join the columns ‘Book Name.’ You can then select the output columns you want in the final table.
Once you are ready click the ‘GO’ button in the bottom right of the Fuzzy toolbar. You should then see an output table that has the columns you selected as well as a ‘Similarity’ column, between 0 and 1.
You will also note in the Fuzzy Matching tool you can select the number of Matches if you want more than 1. You can also add in a threshold for an acceptable match. Lastly, there is an UNDO button, that is helpful when you are playing around with the settings.
Well done! You now are armed with a cool way to match your data sets.