Internet dating is booming. According to Statista, the online dating industry sees revenue of almost $1.935 billion a year, while 50 million Americans have now tried it. The American National Academy of Sciences reported in 2013 that over a third of people who married in the US between 2005 and 2012 met their partner online, half of them on dating sites. What was once seen as the preserve of the sad and desperate is now essential to the continuation of the species.
People use internet dating for a variety of reasons, with the many apps and sites all offering different things. For some, such as Tinder, it is largely based on appearance and therefore expected to have more short term results. Others, such as eHarmony, go deeper, with users required to complete lengthy questionnaires designed to identify a long term match.
Finding a relationship that sticks is not easy. Long term relationships are hard, and many fail. According to data from the National Survey of Family Growth, the probability of a first marriage lasting at least a decade was 68% for women and 70% for men between 2006 and 2010, while the probability that they would make it 20 years was 52% for women and 56% for men.
Steve Carter, Chief Scientist at eHarmony, was at the recent Machine Learning Innovation Summit to discuss how they use data in the matching process. Since its launch in 2000, eHarmony has registered more than 40 million people, 17,500,000 of whom are still on their books. According to their ‘Married Couples by the Numbers’ report 71% of female users and 69% of male users meet their future spouse on the site within a year of creating a profile, so if we are willing to believe their own statistics, it definitely works.
Data has been at the heart of its success since eHarmony’s launch 16 years ago, and is a major differentiator from newer rivals such as Tinder and Happn. Carter notes that originally users didn’t even use pictures, instead creating a dating profile and filling out a 450-item questionnaire that covered every facet of their personality. Since then, they have cut these questions down to a slightly more manageable, but still fairly large, 150 and they also now allow photos, but the amount of data they have managed to accrue during this time is still tremendous, enabling them to pinpoint a significant number of features that people tend to look for in a prospective partner. For example, they found that when it comes to height there is a strong correlation when it comes to the probability of communication, with women tending to go for men taller than them and vice versa. Food preference is also important. eHarmony asks what people eat, and vegetarians, in particular, are more likely to talk to each other, with a communication rate 44% above average.
These are still fairly superficial, though. In his presentation, Carter argued that long term relationships are difficult for two key reasons. Firstly, the goals a person sets out with when doing the things that lead to a long term relationship, i.e dating, are not consistent with their objectives when actually within a long term relationship. Indeed, they’re almost inverted. The priority for the onset period is whether I find you attractive, then it is getting you to find them attractive, then getting them to like you, and finally, after a reasonable amount of time has elapsed, getting them to love you. The goals once you are in a relationship change dramatically. Firstly, you want your partner to love and respect you, to support you and not be taken for granted. Then you want to feel the same about them, then you worry about finding them attractive, and right at the bottom of the pile is them finding you attractive.
This leads to obvious problems when it comes to developing algorithms that can select a potential marital partner. Humans and machines both succeed at tasks where they have readily available or proximal information. We need a clear label associated with our KPI, and the information needs to be as clean as possible, with no noise to distract from what it is trying to say. In those tasks where information is obscured or distal, both humans and machine learning find it far more difficult to make sensible decisions. People often end up using irrelevant information just because it’s available. They may even end up choosing to solve the wrong problem, especially in the real world where appropriate goals aren’t clear. In the case of dating, trying to work out who you’re going to be compatible with in a long term relationship is mind-blowingly complex, so instead people often focus on having a good first date, which increases the chance you will fail in the initial problem as the two goals are diametrically opposed.
Compatibility requires solving the right problem using the right information. Dating focuses on escalation rather than long term success. Escalation is initially based on the proximal or nearby - location, appearance, and trivial social interaction. This is necessary during the onset period of a relationship, but they are not strong predictors of success. Compatibility over a long period is based on things like career and family goals, how personalities mesh, how you approach key choices, how you communicate, and so forth.
The site collects both demographic data (age, gender, location), psychographic data (likes, interests and habits) and behavioral data (actions taken on the site) through its vast surveys. eHarmony’s research team also conducts research on couples who met through the site so they know what to look for. eHarmony’s in-house psychologists and data scientists feed that information into machine learning algorithms that help match compatible users. The problem that data scientists at dating agencies are different from those at the likes of Netflix and Amazon are trying to solve when they deploy machine learnings for recommendation engines. Where with Netflix, you just need someone to like a movie, in the case of dating you essentially need the movie to like you back. So you have both female agreeableness and male agreeable, and female satisfaction and male satisfaction. You have the actor effects and the partner effects.
While they are notoriously secretive when it comes to their algorithm, researchers at Cornell University have been able to identify the elements considered in producing a match. The algorithm evaluates each new user in six areas – (1) level of agreeableness, (2) preference for closeness with a partner, (3) degree of sexual and romantic passion, (4) level of extroversion and openness to new experience, (5) how important spirituality is, and (6) how optimistic and happy they are. A better chance of a good match is usually directly proportional to a high similarity in these areas. They then apply a bipartite matching approach, where every man is matched to several women, and vice versa. The algorithm runs daily, and the pool of eligible candidates for each user changes in the same time frame, with previous matches taken out and location changes accounted for.
Psychologists have been trying to identify the causes of a successful marriage for hundreds of years. The idea that machines - cold, loveless machines - could do it for us is not without irony, but in the future these algorithms could be key to stopping us go the way of the panda, sitting in cages unwilling to reproduce. Or maybe that’s what robots will try to make us do. It’s hard to say.