Online dating is huge. Roughly 22% of 25-34 year olds and 17% of 35-44 year olds have used an online dating site or mobile dating app, and the number is growing.
Ahead of the Predictive Analytics Innovation Summit, taking place in San Diego between February 18th and 19th 2016, we hear from one of the speakers, eHarmony's Director of Data Science, Jonathan Morra, about how the online dating industry is using data.
How did you get into data science?
I have a Ph.D. In Biomedical Engineering. For my dissertation work I studied subcortical segmentation of brain MRI’s using machine learning methods. This was before data science was a term used as ubiquitously as today, so I thought of myself as a computer scientist with machine learning knowledge who focused in the medical domain. After noticing a huge increase in the number of uses of machine learning in other domains, I got excited to join a company that could help me fill in the gaps in my data science knowledge, including grid computing. Fortunately for me, eHarmony proved to be a place where I could learn and experiment very quickly and translate my engineering and machine learning background into a modern definition of a data scientist.
How does eHarmony approach using Data Science? It’s clearly central to how you match couples, but is it used across all facets of your organization?
We are currently seeing data science creep into many different facets of eHarmony’s business. A machine learning approach was adopted for optimizing our matching system to predicting communication, but it has seen growth in other areas as well including match delivery, fraud prevention, email marketing, churn prediction, and our new product Elevated careers.
What do you feel your greatest challenge has been at eHarmony? Does working with concepts such as attraction present a new challenge compared to your previous roles?
Some of our greatest challenges are molding legacy systems to gather and produce data which can provide us new insights today. eHarmony has been around since 2000 and some of the systems weren’t designed with post hoc data analysis in mind. We’ve had to work hard to make sure our data is available when we need it and properly validated. As far as attraction is concerned I don’t think that this particular concept is that different from other concepts. We are interested in matching people such that we maximize the total communication amongst all matches made. Attraction is certainly one element of our matching algorithms, but it doesn’t receive any special treatment. However, because of our domain our job is very difficult because people’s preferences for their romantic relationships vary so much. This is why one of our big pushes for this year is personal matchmaking whereby we include each individual user ID in the model. This will allow us to hone in on everyone’s specific desires and not the optimal global trend.
You’ve had over 11,000 marriages as a result of people meeting on eHarmony Australia since its launch in 2007. Do you think that data can better judge who you are likely to be attracted to?
That’s a tricky question because our data has shown that whom you are attracted to and whom you’ll have a long lasting relationship with are not necessarily the same person. Because of this divergence in needs we use two different systems for matching. Our compatibility system creates pairings (potential matches) for long term compatibility. These are models made by our team of psychologists who have studied marriages and are used to predict long term marriage satisfaction. Only those pairs whom are psychologically compatible are then sent to our affinity scoring system to predict their probability of communication. We use two way communication as a signal for mutual attractiveness. Therefore our matching system uses long term success as a gateway, and if you meet someone on eHarmony and get married, your marriage should be happier than a strong majority of marriages in the wild.
Do you currently use machine learning on images to get an idea of who people may be physically attracted to? Do you think this is a direction the industry could go in as the technology evolves? What else do you see as being a game changer for the use of data in the industry - and for data science in general?
We do ingest information from images when doing affinity matching. I will talk on some of the things that we are using, but we attempt to extract information about users’ faces including hair color, eye color, and facial hair. Judging attractiveness based on images in general is very hard and very subjective. We have done it in the past and found limited success. Using extracted features, though, has proven successful. I think image analysis is currently making great strides with all the work on deep learning, and I think that definitely has a place at eHarmony. I think the next big steps forward are a more unified data model. eHarmony is very good at extracting psychological information from individuals, but we are not good at assessing other characteristics such as musical taste, food preferences, or career ambitions. If we could partner with other data sources we could get a unified user representation that is much deeper than what we currently have and create even more satisfying matches in both the short and long terms.
What will you be discussing at the summit?
At the summit I’ll be going over our data science framework at eHarmony focusing on both our data translation layer using a newly open sourced project called Aloha, and how this leads to various models we are currently using. I’ll then speak in depth about some of those models.