In the past decade the term big data has gone from something that we would need to explain the meaning of to our friends, through to it gracing the front page of newspapers across the world. It is no longer a niche, it is instead the established norm. It went from just entering the ‘peak of inflated expectations’ in 2013 to having gone through the entire process by 2015, which shows not only the pace of change within the technology, but the readiness to adopt it throughout the world.
This has seen huge changes in how the concept of big data is perceived, and the increased market for it has meant that the speed of change has only increased as there is more money around to invest in these technologies and an increasing number of people working in the area.
Those who have been riding the wave have seen the positives of this, with the average wage of a data scientist in the US currently sitting at $120,000, compared to the national average of $72,824 for those with an advanced degree and $51,939 for the average US worker. There have also been thousands of companies created to deal with the increased demand for data, from companies tasked with cleaning up a database, through to those offering outsourced data analysis. However, this speed of development and spread across a vast swathes of the world is causing a number of issues.
The biggest is that there is already a skills gap due to the lack of qualified data scientists and when people come out of college into their first job, what they have been studying, although still applicable to data science, is unlikely to give them the kind of technical skills they need to succeed. It means that there needs to be further training to get people up to speed with current technologies and practices. It means that, realistically, courses will need to give graduates a strong grounding in the foundations of data science so that they can learn the technical elements on the job. This slows down the process of on-boarding and software mastery.
Similarly, the speed of growth has also meant that companies are increasingly overpromising and underdelivering. With an increase in opportunities for new companies the market has become saturated and competition for contracts has made what was once blue ocean into red. This damages the spread of data within companies, as when company leaders are promised that they will get the world instantly, but then they get considerably less, they are unlikely to jump in further. With this increased competition companies are frequently sold services that they don’t need, further damaging the reputation of the wider data community.
It is a similar situation that CRM systems found in the early 1990’s, where companies vastly overpromised on what they could do, meaning that millions of companies invested in them only to find that employees rebelled against them, databases were completely muddled, and they still have a huge bill. It is not to say that they haven’t been successful since, as the total spent on CRM software was $26.3 billion in 2015, up 12.3% on the previous 12 months. However, according to a report from C5 Insight, 30% of all CRM implementations still fail today.
Data led initiatives may end up having a considerably more complex relationship with success than CRMs too, given that it is a considerably more technical and complicated process than CRMs. A successful data led initiative needs to include one or more data experts, either from the client side or vendor side, a job title that is currently fairly scarce. This means that vendor side data scientists will be stretched thinner as the vendor increases the number of clients and on the client side, they will struggle to find people in the first place.
Big data is going to grow even further and perhaps even more rapidly in the future. It is certainly not yet at saturation point, but the pace of change and the demand for faster, better technologies means companies are investing an increasingly unsustainable sums under the mistaken impression that it’s a magic cure to any business issues. The IDC predicts that big data spend will reach $150 billion in 2017, up 12.4% over 2016. Companies are soon going to realize that after the big wins, the same investment will not have the same results. For instance, where an additional $1 million funding would initially have an ROI of $4 million, it may only have $2 million when you increase it again.
There is also an issue surrounding data stewardship, with the pace of change meaning the legislative bodies who need to set and oversee new laws often cannot keep up with the changing environments. For instance, the Data Protection Act in the UK was written in 1998 and in the US there is no federal law, with state laws being used to fill the gap and in many cases the The Federal Trade Commission Act being awkwardly applied to data protection with no formal framework. Although companies have a duty to keep data safe (and most do so with due diligence) the reality is that unless pushed to do so, many companies will presume data theft won’t happen to them and simply opt for the cheapest option.
Data science has been one of the most important developments of the past decade and the speed of change within it has been impressive. However, this pace of change has created challenges, the big question is whether moving forward so quickly may eventually end up holding it back.