Data Isn’t Dead, But The Way We Do Election Modeling Could Be

After the abject polling failure in the election, where do we go next?


In the run-up to the election, Huffington Post mounted what could, in the world of statistics at least, only be described as a sustained assault on Nate Silver’s 538 website, publishing no fewer than three articles virulently criticizing his predictions for the US Presidential election and the models he used to make them. What prompted this vendetta is unclear, but it seems that it was, at least in part, an attempt to calm fears among its readership that Hillary Clinton may lose to Trump.

Well, they did an excellent job. Possibly too good, apparently calming many to the extent that they felt they didn’t have to worry about actually voting for her. There is a strong case to be made that Silver was wrong, because he was, giving her a 65% chance winning. However, he was far less wrong than most, predicting many swing states for Clinton that only went to Trump within the margin of error. And he was certainly less wrong than the Huffington Post, who had her at a patently ludicrous 98.3%. Not that they were the only ones. Princeton Election Consortium's Sam Wang’s model showed a >99 percent chance of a Clinton victory, and The New York Times’ model at The Upshot put her chances at only a slightly more conservative 85%.

In his attack on Silver in HuffPo, Ryan Grim wrote: ‘The short version is that Silver is changing the results of polls to fit where he thinks the polls truly are, rather than simply entering the poll numbers into his model and crunching them.’ His article concluded, ‘If you want to put your faith in the numbers, you can relax. She’s got this.’

Given that many will have taken Grim’s advice and had faith in the numbers, it is understandable that this faith is shaken, maybe even gone. The numbers failed them. Mike Murphy, a Republican strategist who predicted a Clinton win, summed up the mood towards numbers on MSNBC, saying ‘My crystal ball has been shattered into atoms. Tonight, data died.’ But what do flaws in the election modeling and polls actually tell us this time? Is data really dead?

Ultimately, this result has served only to underline Silver’s argument that numbers are not magically correct simply because they’re numbers. They are not infallible, and in politics as in business, they need context, because when humans are involved, there is always going to be a high degree of uncertainty. Silver’s trend line adjustments that Grim took such umbrage with in his article are an attempt to account for this uncertainty, correcting predictions based on historical patterns. The problem is that there is a limited amount of historical data when it comes to elections, with thorough and modern polls only going back as far as the 1970s, and this election was totally different to those that had gone before. There has never been a female presidential candidate from a major political party before, there has never been a candidate as completely off the wall as Trump before, there has never been an approach to an election campaign like his before, and many who voted for him had never voted at all before. This meant that historical data was more or less rendered useless, adding in a further layer of uncertainty, which is likely why even Silver got it so wrong.

There were a number of reasons polls were wrong on an individual basis. Pollsters are less likely to question new voters, in this case white working class Americans. It has also been claimed that there were many so-called ‘Shy Trump’ supporters, echoing a phenomenon seen in Britain when the polls got it so wrong about Brexit and a Conservative Party majority in 2015, hiding their voting intentions for fear of being labeled racist and sexist. Frank McCarthy, a Republican consultant with the Keelen Group, a consulting firm in Washington, DC, said: 'People have been told that they have to be embarrassed to support Donald Trump, even when they're answering a quick question in a telephone poll.' He added that, ‘What we've been hearing from the [Republican National Committee] for months is there's a distinct difference on people who get polled by a real person versus touch tone push poll.’

Data is not dead, but this election must be a learning experience for all pollsters and election modellers, including Nate Silver. Not all polls were wrong. Over the last two months, 10 polls published on Real Clear Politics gave Trump the lead. Nine of these were from from LA Times/USC, and pollsters must look at what they did right. MogIA, an AI system which analyses data from Google, Twitter, and Facebook, also correctly predicted that Donald Trump would win based on the amount of chatter there was, and advances in the ability to use unstructured data to gauge sentiment should help improve this even more before the next election. Politics is ultimately about more than numbers, it is about people, and people are hard to predict, but not completely impossible. Election modelling has to account for uncertainty, and if it looks at the context on social media and corrects for other factors seen in this election, then it will improve. We have to accept this uncertainty rather than demand definitive answers from the press and pollsters as soon as possible. As Pradeep Mutalik noted ahead of the election, ‘Aggregating poll results accurately and assigning a probability estimate to the win are completely different problems. Forecasters do the former pretty well, but the science of election modeling still has a long way to go.’ This defeat is a defining moment, and how they bounce back will determine whether or not they ever actually get there.

Bean small

Read next:

City of Chicago: An Analytics-Driven City