The cost of poor data quality is tremendous. Estimated by IBM to be roughly $3 billion a year in the US alone, it costs organizations between 10-30% of their revenue a year. Subsequently, despite the promise of big data, just 25% of businesses are successfully using it to optimize revenue, while the rest are losing out on millions.
The sum of money IBM believes is being thrown away may seem unbelievable, but it makes sense when you consider how often data is used in everyday working practices, and the impact that wrong data could have a result. The primary cause of bad data is simple - data decay. Data decay is estimated to be as much as 70% in B2Bs. Using out of date data is like filling a competitive egg eater’s bowl up 70% with rotten eggs - while they might look right, if they don’t stay down then the outcome isn’t going to be pretty for anybody.
Data is constantly decaying. Lives change every day - people move houses, they switch jobs, they change numbers, and their contact details change as a result. It can even happen that streets will get renamed and area codes change. As a consequence, email addresses change at a rate of about 23% a year, 20% of all postal addresses change every year, and roughly 18% of all telephone numbers change each year. If you’re not on top of these changes, your sales teams will not be calling the right numbers, your marketing teams are not sending campaigns to the right email addresses, and you do not have anywhere near the understanding of your potential reach that will enable you to reach clients. You’re wasting manpower and money that’s far better allocated elsewhere.
Another cause of bad data is corruption as it passes through the organization from the initial source to decision makers. In a recent interview with us, Vijay A D'Souza, Director of the Center for Enhanced Analytics at the US Government Accountability Office, explained that, ‘Regardless of the goals, it’s important to understand the quality of the data you have. The quality determines how much you can rely on the data to make good decisions.’ This data is liable to be guided by false assumptions and drawn from sources like ill-considered market research.
One example of this is work carried out by the municipal authority in charge of Boston, Massachusetts. They released a mobile app called Street Bump in 2011 in an attempt to find a more efficient way to discover roads that needed repair by crowdsourcing data. The app used the smartphone’s accelerometer to detect jolts as cars went over potholes and GPS to correlate it to where the jolt was felt. However, the system reported a disproportionate number of potholes in wealthier neighborhoods, having oversampled the younger, more affluent citizens with better digital knowledge who were willing to download the app.
A landmark paper in 2001 showed that legalizing abortion reduced crime rates - a conclusion with major policy implications. But, in 2005, two economists at the Federal Reserve Bank of Boston showed the correlation was due to a coding error in the model and a sampling mistake. This example pre-dates the era of what we now see as big data. These problems are not new, but they show the dangers of quantitative models of society.
Another consequence of poor data quality is the additional costs it causes IT systems. According to estimates by vendor Effectual Systems, content management systems are made up of between 50-75% junk data, and fixing it manually can cost IT $600,000 and 12,000 man-hours per year.
The solution is clear, but it is not easily done. To clean it is costly, it takes time and a real willingness to do it. Data needs to be assessed frequently for signs of bad data, and assumptions underlying analysis need to be challenged by decision makers at every stage. If something seems wrong, you need to look back at the data you have rather than simply assuming it’s right. Data usually trumps gut instinct, but bad data can wreak havoc, and it is better to be safe than sorry.