New Study Reveals A Major Challenge Big Data Will Have to Overcome

What happens when AI demonstrates it has 'gaydar'?


Humans spend quite a bit of time judging books by their covers, despite conventional advice not to.

A study conducted in Canada set out to determine whether test subjects could make accurate guesses about another person’s socioeconomic status just by looking at their faces. Incredibly, the subjects were able to make such a judgment with a high degree of accuracy.

So what might it look like if machines could make similar inferences? And what would it mean for privacy, safety, and big data — which is now the world’s most valuable commodity?

Study: How Well Do Machines Know Us?

A second study — this one performed at Stanford — wanted to answer that very question. Could a well-written piece of facial-recognition software correctly identify a person’s sexual identity just by looking at their face?

The answer, according to Michal Kosinski and his colleague, Yilun Wang, is 'yes.' During the course of their study, they collected thousands of photographs from Internet dating sites. They categorized the photos according to the subjects’ 'apparent sexuality.' With the help of an algorithm they wrote themselves, they got a computer modeling program to correctly identify subjects by their sexual orientation 81% of the time.

Human subjects, on the other hand, only correctly identified sexual orientation 61%. 'Gaydar' has been in our collective lexicon for a while, but it looks little better than random chance.

Critics of the Study

The results of the Stanford study attracted criticism almost immediately from two of the country’s most vocal LGBTQ rights groups. Public statements condemned the study as 'wrongfully suggesting … AI can be used to detect sexual orientation.' Their criticism was twofold:

- No impersonal mechanism — no matter how advanced — could ever be relied upon to correctly identify a person’s sexual identity. Kosinski and Wang admitted their study 'assumed' individuals seeking same-sex activity on dating sites identify as homosexual.

- Even if such an algorithm demonstrated reliable and uncanny accuracy, the moral and privacy concerns are simply too great. At a time in history when non-heteronormative individuals still live under threat of persecution from governments and private citizens alike, developing an algorithm that can 'out' them is irresponsible.

What the study’s critics miss, however, is that several of their criticisms fall nearly perfectly in line with the aims and findings of the study itself. Kosinski and Wang were clear in their paper’s abstract: 'Our findings expose a threat to the privacy and safety of gay men and women.'

Judgmental Machines

Word has likely reached you by now that Equifax committed one of the worst betrayals of public trust in history when hackers compromised exceptionally personal financial data on nearly half of America’s population.

You probably also watched Apple’s latest keynote and wondered with the rest of the globe what the phrase 'face data' means for the future — or the end of — privacy. And even if the LGBTQ community, by definition, represents a minority of Americans, their concerns about privacy absolutely do not. Americans cite cybercrime as a greater threat than terrorism, and perhaps with good reason.

Big names in technology have already made themselves outspoken on the need for a 'watchdog' for our nascent research into artificial intelligence. Elon Musk is perhaps the most vocal and eccentric, even going so far as to describe the situation as an 'existential risk' to humanity.

Whether his predictions turn out justified is a matter for the history books, but more immediate concerns have emerged. If even primitive data-collation technologies can do as well as humans, or even a little bit better, at singling out members of the population according to superficial features like their 'apparent sexuality' — the phrase the researchers used to describe how they categorized the dating photos they collected — then are we not simply building bigoted machines?

By most modern standards, it is inaccurate and old-fashioned to assume a human male with superficially effeminate qualities identifies as exclusively homosexual. The same would likely hold true for any combination of gender and sexual orientation. We are not, necessarily, what we look like.

Apple’s new Face ID feature is only one example of face recognition, and there will be others. But there may be a distinction worth making between a technology that uses personally identifying information — the image of one’s face — to provide convenience to its owner, and a technology that uses the same information to force binary social conventions onto a person based on the way they look.

'Data' is supposed to be something concrete: something measurable, tangible and actionable. Something that’s the same in every language. Building machines to judge books by their covers doesn’t sound like data — it sounds like something else entirely.


Read next:

Why Blockchain Hype Must End