We have seen that data can have a huge effect on our lives.
Every day we see adverts on websites that are tailored to us specifically, or we receive a supermarket voucher through our letterbox with a deal based on our purchasing history.
However, few uses can claim to have the level of impact as online dating.
Data is now used by some of the largest online dating sites to match two people who are perfectly suited to each other based on answers to questions, profile preferences and online behaviour. With this information, Data Scientists at these companies are attempting to find data in love. It is the perfect example of personalities being digitized.
With this in mind, we spoke to Thomas Levi, Senior Data Scientist at Plenty Of Fish.
Plenty of fish is the largest global dating site with more than 3 million active daily users, they even claim that you know at least one person who has found someone through POF (Plenty Of Fish). So Thomas seems like the perfect person to talk to about using data in dating.
‘I think that one of the biggest challenges we face is that people are complicated, messy beings, even more so when they are interacting’ is an apt description from Thomas whilst discussing the difficulties surrounding working in data on a dating site. As the Senior Data Scientist at the biggest dating site in the world, this task becomes even more difficult, ‘We have a lot of data, but trying to encompass the whole of human romantic matching and interaction with it is difficult. We have so many users on our site, from such diverse backgrounds and desires it’s hard to cover it all perfectly.’
Therefore, in order to fully utilize the huge amounts of data that the company has, Thomas takes a more holistic approach, ’I look at what I do as trying to increase the probability you’ll find someone you're compatible with in a given timeframe and have the best experience while searching that you can’.
Aside from working through the basic algorithms for matching, Thomas is keen to implement new products at the company to help optimize their matchmaking potential and improve the UX on the site. I asked Thomas which implementations he was most proud of from his time at the company:
‘Two things stand out, and both improve our users’ experience. The first is a new feature I just designed that allows for a contextual search and matching based on your personality and interests. It’s able to take in just about any search terms, and map them to overarching topics. As an example, if someone say enters “snowboarding” it figures out they are interested in outdoor sports. If they then also enter a brand of microbrew beers, it then knows they are a balance of outdoor sports, and foodie/craft beer drinker. We can then go find you matches into those things. The system is surprisingly robust, the current vocabulary I have it working on is around 300,000 words or so. What makes it particularly interesting is that both the vocabulary and the topics themselves are bootstrapped organically from our actual users. It both removes any bias I might put into the system and allows me to build models in languages I don’t even speak (or account for colloquial slang, names of TV shows, videogames etc.). This feature is just being rolled out, and we’re going to be iterating its placement and presentation on the site. I’d love any and all feedback on it.
The second one was improving scam detection on the site. We’re probably the most aggressive dating site out there when it comes to eliminating scammers and fake profiles from the site. Since the systems I’ve built or helped build have gone in place we’ve reduced the number of fake profiles and the volume of their messages by something close to an order of magnitude. That improves the overall user experience quite a bit.’
As well as his work at POF I was keen to know his opinions on some of the issues that Data Science is having, especially with the much discussed skills gap that has appeared. Like many who have worked within Big Data for a few years, Thomas is unsure of what Big Data skills are, simply because everybody seems to have a different view on what they consist of.
However, he is unequivocal about where he believes a gap does exist, ‘[Where a] gap exists is in the skills and understanding of being a good Data Scientist (working with big and small data)’. He describes to me how many graduates come from university with impressive technical qualifications, but they often lack the understanding of the underlying statistics to really become a good Data Scientist. As Thomas puts it, ‘A surprisingly large number of people are experts in Deep Learning, but can’t explain how to run a simple binomial test for website conversions. The best advice I can give is to get a solid grounding in statistics to go with Machine Learning and problem solving’.
One of the main issues that many companies are finding today according to Thomas, is not that they cannot find Data Scientists, but that they cannot find Senior Data Scientists. Thomas says, ’Often I encounter companies looking to make their first Data Science hire. The difficulty with the first hire is that they don’t have someone who can properly interview and assess them. The junior people I meet are extremely capable and smart, and with a proper mentor could grow into the role, but shops seem to want them to be the sole Data Scientist. That’s problematic and I’m not sure how to solve that’.
Finally I wanted to know his thoughts on the future of Big Data and how it was going to progress.
Thomas makes it clear that there is a fair amount of excitement within the industry, ‘I think we’re in a period where people are (rightfully) excited about the sheer amount of data we can actually store and process. A lot of the conversation and hype is focused around the size of the data and the technologies used to work with it’.
There will be changes in the future though, this is obvious from the speed at which the industry has grown in the past 4 years, and Thomas is aware of what this will entail, ‘As we become more used to these things and the technology stacks become more mature, we’re going to move the conversation towards what we can do with this data and what are the right sort of questions to ask of it. After all, this is a business and value has to come of it. I think we’ll see a bit of a separation of the companies that ask the right questions for their business model, and the ones who just think Big Data equals profit on its own’.
It seems that with the work that Thomas has done so far at POF that he will have his hands full moving forward. Having the ability to match people to their perfect partner is always going to be an interesting way of using data and Thomas seems to be revelling in his role. In his words, ‘ I’m really excited to see the sort of interesting things we can uncover in the future’.
You can follow Thomas on Twitter at @tslevi