Fuzzy Matching At Its Finest

The trick behind a clean database


I love this season. Snow, bobble hats and lots of food. So when I come across a funny news article it just makes it even better. And to put the cherry on my already large cake, I can relate this one to data!

So, as reported by BBC news (check it out here), a couple who had recently moved house received a letter that was addressed ‘Tony & Sarah Wren, Somewhere near the sea in Suffolk’. I think you and I can both agree that this should have been lost in the post and not delivered in the four days that it was. I don’t know about you but I really underestimated the postal service when they lost my letter to Hogwarts 9 years ago. Still waiting.

Anyway, this rather odd format of an address (please don’t try it at home) is a fine example of really, really, really good Fuzzy Matching.

So fuzzy matching works by linking inputs which are not 100% accurate with the record it was supposed to be replicating. In simpler terms, it’s like predictive texting but for addresses. You will have seen it out and about, when you type in your address and parts of it get changed and formatted differently right before your very eyes. If you’re like me and just think ‘Oh I must have been slovenly’ and leave it at that then you probably don’t think it’s as special as it actually is. But it’s a cool bit of technology sat right under your nose. So, 8 Caambrian Rd will be recognised as 8 Cambrian Road and boom just like that, the address is lovely and correct.

Some features of fuzzy matching spot odd spacing, missing or jumbled letters or numbers, typos, abbreviated words etc. Going back to the letter, the fuzzy matching, in this case, was Royal Mail whose Suffolk branches obviously had excellent knowledge of people in their vicinity. Otherwise, fuzzy matching is normally technology (however, find me an app that can get the right address from the one on the letter!)

So you can see that fuzzy logic obviously has positive impacts – it helps post reach the right people being the main benefit!

However, it also allows CRM databases to not be inundated with duplicate records if someone writes ‘road’ one day and ‘rd’ the next, for example. So in turn, this improves the quality of data in your CRM by standardising all the address so they are all in the same format. Also. It ensures that multiple communications aren’t sent to the same person/company/address which wastes time, money and irritates the receiver.

So there we go, some humans performing fuzzy matching, all wrapped up with a good ending for Christmas!

Big data hype small

Read next:

Is Big Data Still Overhyped?