We Need More Data In Our Polling

Political polling is lagging behind in its use of big data, which is having adverse effects


The UK is midway through an intense general election campaign, with what seemed like an insurmountable lead by Theresa May’s Conservative Party being cut increasingly quickly by Jeremy Corbyn’s Labour party, according to the polls. Many had the Conservatives up by 20% going into the campaign, with some polls suggesting that this has been reduced to around 6%, which is a huge swing given that the election was only called on April 18th.

However, trust in these polls is at an all-time low, with the polls appearing to be wrong in the 2015 UK election, the 2016 Brexit referendum, the 2016 US election, and the 2017 French election. The leader of the Labour party at the time, Ed Miliband, recently wrote on Twitter:

The pollsters have been off my Christmas card list since 2015. #justsaying

— Ed Miliband (@Ed_Miliband) May 30, 2017

The truth is that aside from the French election (which few people talk about because they were wrong in favor of the predicted winner) each of these results were within the polling margin of error. Brexit polls were more-or-less 50/50 throughout with a 4% margin either side, meaning the 52-48 result was within these parameters. Even the Trump win, which was seen as completely out of the blue, was still well within the polling margins of error, with the result of around 3.3% not even close to the largest error in modern history of a 7.2 in 1980. In fact, according to FiveThirtyEight on November 7th, amalgamated polling suggested that there was a 10.5% of the final result and without running the election again 100 times, it is impossible to know if all the unlikely factors all joined together to create this relatively unlikely outcome. To put it into context, Trump’s changes of winning according the polls were 500 times better than Leicester winning the Premier League in 2016.

So it is not necessarily fair to say that polls got these results ‘wrong’, but more that the margin of error in current polling causes issues, especially when these are numbers that are reported on as more or less certain. With headlines such as ‘Guardian/ICM poll: Tories' 12-point lead offers Labour crumbs of hope’ and ‘Labour leads Conservatives by 9% in 25-34 age group, finds latest poll’ giving the idea of relative certainty where the polling is actually far from it. With a built in polling error of 2-4% on average, this is more than enough to completely change that picture.

So the big question is, with the huge amount of data available and the progression of big data in its ability to predict, is it possible to narrow this kind of error?

There are already some people who have thrown off the old polling norms and adopted a big data approach, such as Associate Professor Bela Stantic from Griffith University. He told a conference the day before the election that Trump would win based on his analysis of social media conversations, saying ‘people are likely to be more honest when telling friends rather than answering polls. It is scary how accurate prediction can be done by analysing social media.’ His work actually predicted 49 of 50 states using social media conversations, throwing off the standard questions asked by pollsters. Bela claims that ’Such analytics can provide much more accurate information than telephone polling, especially in a day and age where people have caller ID and don’t have landlines.’

This isn’t at all surprising, given that social media has had such a huge impact on recent elections. There has been huge influence placed on electorates from targeted messaging, allegedly used by the controversial Cambridge Analytica in both the EU Referendum and US elections based on its data after all. However, in the future there may well be more call to have them used as active polling data as opposed to a simple ad targeting platform.

Unfortunately it appears that traditional polling companies are yet to adopt new data-driven techniques, instead relying on old systems of calls, door-to-door surveys and online polling. Each has severe flaws, the biggest being that people are acutely aware that they are being polled. Instead, looking at online behaviors can help predict voting intention considerably more accurately, giving a clearer picture of the real public opinion.

We are not yet at that stage, but given the damage that unreliable polls are doing to the world, it needs to be considered. There will naturally be bumps along the road before big data becomes a standard method of polling, and confidentiality will certainly be an issue, but it will be worth it once we learn to get it right.


Read next:

Why Blockchain Hype Must End