Big data as a term has been around for a while, it has a deliberately broad reach, including everything from basic database management through to AI and self driving cars. It is so ‘catch all’ that we can dedicate a channel to it that has talked about everything from animal conservation through to the rise of robot rights.
This has had some real benefits for the wider media who can use it as a catch all phrase rather than needing to drill down to the complex sub-elements. However, as time has gone on and hostility to the idea has become gradually more prevalent, should we start looking at ways to put the term to bed?
Bloomberg’s resident anti-big data advocate Cathy O’Neil, has penned an article ‘Do You Trust Big Data? Try Googling the Holocaust’ which essentially decries how some companies use data. At the top of her hitlist is Google, which is referred to as one of the ‘big data companies’ and O’Neil points to several examples of how Google have sorted their results and put misleading pages towards the top, with controversial topics like Holocaust denial and racist crime appearing towards the top of results.
O’Neil makes some other interesting points too in her well publicized ‘Weapons of Math Destruction’ theory of algorithms that aren’t fair, such as those in the education system and in policing that focus on minorities and the poor. However, what she discusses in her talks on this matter is big data in the catch all sense. But this is not what big data is, in the same way that Google is not a ‘big data company’ and blaming search results on big data isn’t a fair statement to make. It is the equivalent of tarring an entire group by the actions of a few individuals. After all, there are mechanics who aren’t too good at fixing cars and make them more dangerous to drive, but nobody says that the automobile repair industry as a whole is a bad thing.
The problem with referring to everything data related as ‘big data’ is that it makes people scared of the good uses of it. Think about how much impact it is likely to have on the way that people can be monitored by a doctor if they’re seriously ill without needing to stay in hospital, or how motion capture cameras in jungles are helping to track the population of endangered animals. These are just as much about big data as where a result appears on Google or how a model is created on crime data.
Machine learning, AI, predictive analytics, business analytics, IOT, warehousing, and thousands of other variants are all elements of big data and to group them in together as one catch all term may be useful in some ways, but increasingly the term is used to criticise the entire data ecosystem rather than the small elements that may not be especially popular.
It means that articles with titles like ‘Do You Trust Big Data? Try Googling the Holocaust’ or ‘When Algorithms Come for Our Children’ are simply spreading hate and misinformation around subjects that few people would understand when published in a widely read non-specialist publication. If somebody were to read an article that attacks the concept of big data this viciously and were to be told by their doctor that their data is being collected remotely, how likely are they to adopt a potential life saving data-led technology?
Big data has developed to the stage where it currently is because it has had a huge net benefit to the world. This has been through the sum of its parts, which have all developed and helped different areas of society in totally different ways. Avoiding this kind of generic criticism is essential to its continued spread into areas that will benefit from it.
It isn’t to say that everything that data has done has been positive and O’Neil is certainly right to point out some of the elements that haven’t worked and that are wrong. But to criticize the concept as a whole without giving context or even acknowledging the undeniable, and often unrecognized, benefits that big data has brought to our society is completely wrong and we should be pushing back on this kind of catch all language. Perhaps killing the term big data may be the only way to do it.