With some of the biggest political changes over the past 50 years happening in 2016, dividing nations and spawning all kinds of strange conspiracy theories, we hear a lot about the concept of confirmation bias. It is used to describe why people believe that ‘pizzagate’ is a thing and why the popular media are all lying. People can look at evidence is completely different ways, for instance after the Brexit vote the UK economy grew by 0.5%, which is a significant slowdown from the predicted 0.7% but still growth when some said the British economy would be hit with a recession. Some people therefore interpreted this as a good thing - there is still growth - or a bad thing - growth is already down 0.2% and the UK hasn’t even left the EU yet.
The same issue is found in the use of data, especially in the current climate where data can be fed from thousands of locations for analysis. Spencer Greenberg, founder of clearthinking.org, also believes that despite being a powerful tool, data is still not always accurate, ‘There are a lot of perils of data analysis. Some people say,’The number says X, therefore we should do X,' but there is more nuance, [such as] how the number was calculated and whether you are able to make the inference from the number you think you can make.’
One of the big issues that data currently faces is that although the eventual analysis may show a clear path or action, there is often little context about how the data was created or filtered in order to find that conclusion.
For instance, an organization has launched a new product and want to show it was a success through data analytics. If they are to perform this analysis themselves, there would be a natural bias because they want the data to show they are doing well. It may be that the product is being very poorly received online, so poorly received that there are hashtags on Twitter about it. While this would, in most people’s eyes, be a bad thing, the organization could put its own optimistic spin on events by saying that it has a huge amount of online engagement, convincing itself that the negative response was actually a positive.
This is clearly an extreme example, but conformation bias impacts on the results of big data in far more subtle ways, from the ways we make certain baseline assumptions to how we create algorithms. It is not limited to data either and often impacts the scientific basis behind our work. A 2005 paper by John Ioannidis ‘Why Most Published Research Findings Are False’ found that ’45 studies that claimed to have uncovered effective interventions with data from subsequent studies with larger sample sizes: 7 (16%) of the studies were contradicted, 7 (16%) the effects were smaller than in the initial study and 31 (68%) of the studies remained either unchallenged or the findings could not be replicated.’ Given the importance of these papers, it is not a huge jump to think that the findings found within them would have therefore had at least some impact on algorithmic assumptions.
Our society today is seeing more and more confirmation bias, whether that is from sporting decisions or full blown conspiracy theories about lizard people, but it is not limited to what we say and do. The algorithms we create, the way our emerging AI foundations operate, and even how data is stored is being impacted by confirmation bias to some degree. Unfortunately, there is very little we can do, especially when we utilize open source code and technologies, which will then input the biases of several entities into one place, creating an intertwining weave of biases that are practically impossible to separate.
This is not likely to have a huge impact and most of these biases are going to be so tiny that people don’t even realize they’re doing it, but it is something we need to be aware of, especially as we are creating the foundations for the future of data today, the bias from which may still be impacting analyses in 100 years.