With Fake News And Hate Spreading On The Internet, Big Data Needs To Prove Its Worth

The internet is often an unpleasant place, but can data clean it up?


Big data is non-political. It does not sit on the right or left of any spectrum, discuss racism, anti-semitism or sexism, it instead gives those who want to analyze it the opportunity to get a clearer picture of these elements.

For instance, since Donald Trump’s election in the US the number of hate crimes reported has gone up by 6%, since Brexit in the UK the number peaked at a 60% increase and still sits 14% above the numbers from 2015. Much of this has been directed at Muslims, but there has also been a spike in anti-semitism over the past 18 months too, with the Anti-Defamation League (ADL) finding 2.6 million tweets sent with anti-semitic language between August 2015 and July 2016.

This has led to significant threats to Jewish journalists including Hadas Gold, a media reporter at Politico, who, amongst other abusive messages, received a private message on Twitter that according to the New Yorker included ‘an image of her with a bullet hole in her forehead and a yellow Star of David on her shirt’ and the message ‘Don’t mess with our boy Trump or you will be first in line for the camp’.

One of the single largest reasons behind this spread of hate online has been the use of fake news to push narratives that solidify extremist views. For instance, the story of a Jewish family who made their child’s school cancel their nativity play because of their religion - which was fake news, but was used as a stick to beat Jews across the country.

There has been considerable coverage of this in the media, but with Twitter blocking just 21% of the anti-semitic accounts found by ADL, it doesn’t seem to be having much of an impact. However, could big data and analytics have a part to play?

Quite simply, big data, machine learning, and AI are the only weapons that people have against the kind of hate we are seeing across the web. It isn’t much of a cognitive jump to see that this spreading of ideas has emboldened people on the street to undertake hate crimes, so it needs to be stopped at its core. Luckily, if companies like Twitter and Facebook take the time to do it right, the technology exists to easily pinpoint the most hateful fake news as many of the accounts that spread hatred share many of the same traits.

The ADL have been studying the phenomena and found that amongst the white supremacist and anti-semitic tweets, the four most common words found in their bios were ‘Trump,’ ‘nationalist,’ ‘conservative,’ and ‘white.’, whilst specific signs and sayings, such as ((())) or )))((( surrounding names, the number ‘1488’ or the popular meme ‘Pepe the frog’ are also common. This makes it relatively simple to identify a relatively large number of these accounts in one sweep.

Text analysis and AI then has a smaller pool from which it can identify abusive tweets and fake news sources (which are more likely to be shared from these accounts) more quickly. However, as Facebook has shown with their attempts throughout 2016 to stop fake news spreading, it is not always successful through using AI alone and they have needed to hire a team to oversee their monitoring having previously ditched them in August, only for the algorithm to almost instantaneously begin posting fake news to the top trends bar.

Facebook’s example shows the difficulty that machine learning and AI has with these elements, in that it is the same as it is with any human being trying to learn - you need to get things wrong to learn how to get them right. With the pressure that social media giants are now under because of the issues surrounding fake news and abusive websites, they are, to some extent, caught between a rock and a hard place. They need their algorithms to learn through doing (and potentially make mistakes), but they cannot afford to allow another PR disaster surrounding fake news.

It is a catch 22 that Twitter, Facebook and Google need to find a way around, given that the number of news stories and social media accounts is growing every day, so they need technology to scour an wider area, but without allowing it to make mistakes. It is something that they will need to fix as soon as possible, because the longer these things go largely unchecked, the more damage and division they are likely to cause.

Big data hype small

Read next:

Is Big Data Still Overhyped?