Was 2016 The Real Data Election?

2008 was the year data won the election, did it lose it in 2016?


We often discuss President Obama’s two election wins in terms of the amazing way that his team used data and how big data allowed people like Nate Silver to become famous after accurately predicting the results of both. It has led to 2008 especially being referenced as the election won by data, but although that election was certainly impacted by data, it had nothing like the impact that it’s had in 2016.

The first place to start is with data security, which became a key campaign point for Donald Trump, not so much in his support for it, but more his criticism of how Hillary Clinton conducted it. It is the first time that there has been a wide enough basic understanding of data security that voters have understood that a private email server was a bad idea, to the extent that the rallying cry at many Trump Rallies was ‘lock her up’ in relation to it. However, Donald Trump has been roundly criticized himself for not having any coherent plan regarding data security in the US and, when questioned about it in the 1st debate, said: ‘I have a son. He's 10 years old. He has computers. He is so good with these computers, it's unbelievable. The security aspect of cyber is very, very tough. And maybe it's hardly do-able.’ Which shows a thorough misunderstanding of threats posed.

It was also the election where data seemed to, ironically, be ignored or manipulated by both parties, but it was especially clear with Donald Trump’s use of it. For instance, he claimed that the unemployment rate was anything up to 42%, when all data shows it to be at 4.9%. He has also claimed that he will deport 2 to 3 million illegal immigrants with criminal records, when the Migration Policy Institute puts the total number at 820,000. It became an issue for traditional media to deal with as according to Politico over a five day period in September, Donald Trump told a lie or misrepresented data once every 3 minutes and 15 seconds. Trying to use data to debunk this amount of misinformation became almost impossible, which meant that much of the incorrect data had an impact on the outcome of the election.

As we discussed in a much broader article following the election, the polling data was also a huge talking point both during and after the election. Many pollsters had Hillary Clinton’s chances of winning in the mid-90’s, whilst even Fivethirtyeight’s conservative model had Trump at only around a 35% chance of winning on November 7th. These were based on data produced by pollsters, which lacked the kind of clarity that previous elections have had and it showed that pollsters need to change their techniques dramatically. What were considered as ‘better’ polls were anything but, with ‘live interviews’ more-often-than-not taking place over landline phones, which has declined from around 95% of households in 2004 to just over 50% in 2014, with the two years since suggesting that this number will have dipped well below the 50% threshold. Therefore using this as a means of data collection is flawed as the pool from which the data is collected has been diminished. Conversely, the use of internet polls was dismissed as there was no way of stopping people repeatedly taking the same poll to make their candidate’s popularity seem higher than it actually is. However, this method is far more likely to get a larger and more diverse range of demographics, because as landline use has declined by around 50% since 2004, internet usage has significantly increased.

The single most important impact data had on the election came not from either candidate, but Facebook.

We have also discussed this in more depth, but it is again worth noting that the algorithms set by Facebook, combined with removing humans from their editorial vetting process led to a proliferation of fake or highly partisan news being shared across the social network. Mark Zuckerberg claimed that 99% of everything on Facebook was real, but that 1% seemed to have a huge impact. In a study from Buzzfeed, it was found that three of the largest partisan sites, on both the left and right of the political spectrum, either misled or lied in a significant number of their posts. Of the three from the left it was 19%, whilst on the right it was a huge 38%, meaning that more than 1 in 3 stories were fake or heavily embellished. Despite the protests from Zuckerberg, it is clear that these fake new stories had an impact when many of the pages have over 1 million likes, with sites like the Denver Guardian and Ending the Fed sharing false stories.

It could be argued that in an electorate of around 130 million, these are relatively small numbers. But, when you consider that Trump won Michigan (16 electoral votes) by 11,000 votes, Florida (29 electoral votes) by 120,000 and Wisconsin (10 electoral votes) by 27,000 votes, these points suddenly seem far more important. If 177,000 (a number that represents 0.05% of the entire US population or 0.1% of the total votes gained by both candidates) had voted another way or hadn’t been suppressed through negative campaigns, Hillary Clinton would have won with 287 electoral votes.

Data and its use in a variety of areas had a huge impact on this election and the inquest is currently underway into how to fix it. This isn’t just a case of Democrats being angry that polling data was wrong, or that Facebook’s news feed algorithms helped spread lies. Nor is it Republicans arguing that Clinton’s data security was criminal. It is about how data is to be perceived in the future and with ‘Post-Truth’ being named as the word of the year by Oxford Dictionaries, we need to sort it out quickly. 

Vision small

Read next:

Big Data Forecasting In Pharma