How Could Twitter Use Machine Learning To Stop Hate Speech?

It's not as simple as some may have you think


Twitter was originally founded on the idea that people could network and share their ideas through short messages to one another that were open for anybody to see online. However, today it is seen as a hotbed of racism, mysogyny, homophobia, and trolling, where just sending a tweet that could be interpreted as being against a certain cause can see quickly descend into death threats and hateful barrages.

It is also used extensively by previously fringe extremist groups to promote their hateful agendas. We have seen ISIS videos being spread using the platform and far right groups who spread hate about racial minorities and religions. Donald Trump has been criticized by leaders around the world for his retweets of the hate group 'Britain First' with one MP from the UK referring to Donald Trump's actions as 'racist, unthinking or incompetent - or all three'. Twitter has also allowed previously unheard of racists like Milo Yiannopoulos, Mike Cernovich, and Richard Spencer to gain a platform and use the social media site to spread hate speech.

The platform has also allowed people to be influenced by hate speech from actors deliberately attempting to manipulate them. For instance, hundreds of thousands of Russian automated accounts on Twitter have been seen to have been spreading misinformation, fake news, and hateful speech in an attempt to impact the US elections and UK Brexit votes. In a small study of just 139 tweets from 29 accounts, it was shown that Russian accounts had used hashtags related to Brexit, pictures of London Mayor Sadiq Kahn, anti-Muslim posts and racial slurs against refugees.

Twitter and other social media platforms have been rightfully criticized for allowing this kind of behaviour to take place, but with 500 million tweets sent every single day, trying to stop hate speech has been incredibly difficult. Many technologies have been tested to try and police this, but there have been significant flaws in this that have made them unfit for purpose. One of the main elements has been simple keyword detection, but this has thrown up several issues. In the paper 'Locate the Hate Detecting Hate Against Blacks' by Kwok and Wang, they found that 86% of the time when a tweet was classed as hate speech, it was due to an offensive word rather than actual hate speech. For instance, in the black community the word 'n**ga' is often used in tweets in non-offensive ways, but the word 'n**ger' was very infrequently used in anything but hate speech, however, it is not to say that either word is used exclusively for either hate or non-hate speech.

Another way of looking at hate speech, is looking at syntactic features where specific words occur. For instance if the word 'slaughter' and 'Jews' appear in the same sentence, then the chances are high that the post is hate speech. However, this will only ever find hate speech that takes a specific form, it also misses images or anything that sits outside of this particular speech pattern. It throws up considerably fewer false positives, but is likely to miss more than simply using keyword searches.

However, both of these are fairly basic functions that aren't really fit for purpose, so there has been a huge amount of work done to make this considerably more robust. For instance, sarcasm has always been an issue, despite it being the lowest form of wit algorithms can't get their head around it, so MIT researchers have been working on solving this issue. According to Technology Review, the researchers originally set out to identify racist tweets, but quickly found that without an understanding of sarcasm it was almost impossible. However, one major element that helps identify the underlying messaging within the tweet is through emojis, so they taught the algorithm to learn the subtle meanings of emojis to identify the real meanings. Iyad Rahwan an associate professor the MIT Media lab who was one of those who created the algorithm explained that 'Because we can’t use intonation in our voice or body language to contextualize what we are saying, emoji are the way we do it online...The neural network learned the connection between a certain kind of language and an emoji.'

There is also importance of context within language that needs to be taught in order to fully investigate hate speech. In a study by Pete Burnap and Matthew L. Williams, 'Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making' they found that trying to distinguish between good and bad calls to action was surprisingly difficult as without context there is nothing particularly threatening about certain phrases. For instance, 'send them home' 'get them out' and 'should be hung' have some very unthreatening contextual use, like sending kids home from school, getting groceries from a car, or hanging a piece of art. In order for Twitter to make sure they are effectively checking individual tweets they will need to firstly understand the environment in which the tweet is sent, which means analysis of anything it might be referring to and understand the contexts of each tweet to which it is replying.

Given the 500 million tweets every single day, this will require a huge amount of computing power because every tweet will need to be analyzed within the context of the tweets that have gone before it, so each tweet may need to be analyzed multiple times. So not only do Twitter need to create and train machine learning algorithms to undertake this huge task, but they will also need to run billions of analyzes every day in real-time.

There is absolutely no doubt that Twitter needs to do something to prevent its platform being used for hate speech, but simply 'using machine learning' is not a simple or quick process. Given the size of the company and the strength of feeling behind this, there is little doubt that there is an attempt to get this working, but the reality of making it work will be quite different. 


Read next:

Why Blockchain Hype Must End