The data scientist role has long been considered one of the most prestigious of the digital age, with wages averaging around $120,000. Despite this, there has always been a huge shortage of people qualified for the role. One McKinsey study projected that ‘by 2018, the US alone may face a 50% to 60% gap between supply and requisite demand of deep analytic talent.’ This problem is being dealt with in two ways, neither of which seem on the surface like they are particularly positive for those currently training in data science hoping for a big future in HBR’s ‘sexiest job of the 21st century’. Firstly, companies are empowering all of their employees to analyze data themselves, a phenomenon known as citizen data scientists. Secondly, by handing over control of the analysis to machine learning algorithms and AI, which could effectively render the data scientist obsolete.
Even data scientists don’t seem to hold out much hope. In a recent KDnuggets poll - which asked when most expert-level Predictive Analytics/Data Science tasks currently done by human Data Scientists will be automated - 51% of respondents said that they expect this to happen within the next decade. Just a quarter said they expect this to happen in over 50 years or never.
Some of the brightest minds in both academia and industry have set their minds to automating data science. MIT researchers, for one, have designed the Data Science Machine, a system that both searches for patterns and designs the feature set. They enrolled the first prototype in three data science competitions, competing against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating across the competitions, it beat 615, and in two of the three competitions, the predictions it made were 94% and 96% as accurate as the winning submissions. In the third, it was 87%. Where it truly came out of top was in the time it took to achieve these. The machine’s human opponents typically spent months pouring over their prediction algorithms produce each of its entries. The Data Science Machine took hours.
Not only is data being processed automatically, though, it is also being visualized without human input so that it can be analyzed by business people with no interference. Visualizations, when done well, can reveal intricate structures in data that cannot be absorbed in any other way, and if these can provide business people with what they need to know about the data, it’s hard to know where the data scientist fits in.
In the short term, data scientists are unlikely to be replaced. Kevin Murphy, a Senior Research Scientist at Google notes that: ‘The first problem is that current Machine Learning methods still require considerable human expertise in devising appropriate features and models. The second problem is that the output of current methods, while accurate, is often hard to understand, which makes it hard to trust.’ Murphy cites the ‘automatic statistician’ project from Cambridge, which ‘aims to address both problems, by using Bayesian model selection strategies to automatically choose good models/ features, and to interpret the resulting fit in easy-to-understand ways, in terms of human readable, automatically generated reports.’ Their project won a $750,000 Google Focused Research Award, but it still has a number of challenges to overcome if it going to be a success. What Murphy says initially still stands true, and Machine Learning methods require considerable expertise at the point of origination. Ultimately, what it probably means is that data scientists are going to become irrelevant at the front end, and the nature of the work will shift. Adoption of machine learning at scale will likely be slow at all but the largest of firms, and it is probably that machine learning will be an accompaniment, taking many of their more time consuming jobs. However, the nature of all work tends towards automation. The volume of data that companies produce is now far beyond the capabilities of one data scientist to analyze, and it is inevitable that automation will consume the field entirely.