When Using Data Is A Bad Idea

Data is a great thing, but there are times when we shouldn't use it


On November 9th the world woke up to find that the US had elected Donald Trump as president. It was something that the data suggested wouldn’t happen, with Nate Silver’s FiveThirtyEight showing Trump’s chances at less than 30% on November 8, and this was one of the most conservative models, many others suggested he had less than a 10% chance.

Afterwards, the media was full of stories decrying the death of big data and how the pollsters had let everybody down, but the truth is that this simply is not true. In a diverse country of 300 million trying to accurately predict anything is incredibly difficult, especially when you throw in the thousands of variants that can impact every one of those voters. However, running an election without polls would be impossible and pollsters simply need to re-evalutate how they worked, not that they were working in the first place.

However, there are several times when data should not be used and we have seen many incidences when it has been a mistake.

New Ideas

Data has informed companies about almost every element of their businesses, from saving money on utility bills, through to marketing products to specific people based on their preferences. This knowledge is essentially the reason for adopting big data strategies, but it needs to be put in context - you only have data on the things you already know.

Blockbuster was extensively using data before they went under in 2013, including in 2000 when they could have bought Netflix, now a $28 billion company, for $50 million. They didn’t make the decision because they didn’t have the data to show the potential that Netflix had, because quite simply it didn’t exist. At the time the average internet user had a dial-up connection and the idea of streaming movies online (something that Netflix wasn’t doing at the time) couldn’t be seen.

If you were to have looked at the data at the time it would have seemed like a pointless buy. However, with hindsight, while the idea that brought about Netflix’s huge success was one created in 2007 when they began streaming films, the foundations of its current iteration can be found in 2006 when the company looked to create an algorithm that suggested movie titles for its customers. Data showing whether this would have been a success simply wasn’t available, so it took a leap into the dark and innovation, something that an over-dependance on data can often dampen.

Data should always be used to look at potential new iterations and new directions when it is available, but limiting your thought process to what data you have collected in the past can put brakes on your future.

When It’s Incomplete Or Out Of Date

Using data that could potentially be inaccurate through a systematic fault can always be dangerous as it is the definition of ‘garbage in - garbage out’. In order for data to be actionable, it needs to be fresh as buying habits, personal circumstances or job status. A study from the Ministry of Manpower suggested that an average email list, one of the most basic forms of data collection, will see 4.5% of leads changing jobs in one year. This has a fundamental impact on how many companies will analyze their data and is only a fraction of the amount of data that can be collected and analyzed today.

Data is only ever as good as the actions it can predict and every day that it sits in a database without being audited, it becomes less valuable and less likely to create genuinely actionable insights. It is therefore essential to not just collect data, but to then continue updating this data, if not you may as well not bother collecting anything in the first place.

Uncertain Origin

2016 has been the year of data, with it becoming a key talking point in the US elections, whilst data breaches at companies became front page news. There have been well over 1 billion records stolen over the course of the year, many of it with detailed information about people.

These records can end up anywhere and, in many cases, even unknowing company databases.

This could come from lists that appear to be legitimate or even people using stolen identities to to access the services of a company. With the number of lists offered to companies daily, it is no surprise that hacks continue to happen purely to steal data that is then sold on.

However, whilst the ethicacy of how data has been acquired should clearly be a factor, in terms of actual business performance, it is essential to know that the data can be trusted. If it isn’t collected by your company, can you implicitly trust the collection methods? If it has been bought from somebody else, do they have permission to sell this data and are the people on it who they claim them to be?

Bean small

Read next:

City of Chicago: An Analytics-Driven City