Gartner predicted that 60% of big data projects would fail last year, crashing and burning before they even reached the piloting and experimentation phases. Whether this came true is yet to be revealed, but as we enter a new period for data, one driven by machine learning and AI, it is more important than ever that they succeed. If you don't have an effective grasp over your data initiative, you will fail to exploit AI and machine learning in your organization because you have no data pipeline to properly train your algorithms. As Ali Ghodsi of UC Berkeley, notes, 'there are only 1% that are succeeding in AI, the rest of the 99% are left behind and struggling to get all this big data technology.'
There is, unfortunately, no one-size-fits-all solution to failing data projects. The causes are many and complex, dependent as they are on a large number of variables that are often impossible to foresee. In his recent presentation at the Chief Data Officer Summit in New York, Sashi Marella, Senior Data Scientist at Viacom, talked us through some of the main misconceptions held about implementing an enterprise-wide data science initiative, and what his team had done at the media giant to ensure that they were successful.
Through a lengthy process of trial and error, Viacom has amassed a tremendous understanding of how to approach a data science/machine learning initiative. Marella believes that the most common misconception around implementing an enterprise-wide data science initiative is that somehow we can throw money at infrastructure and data and it will provide us with everything we need. You can't simply take any business, buy some stack on AWS, hire a couple of data engineers and data scientists and say you're done. This is a dangerous assumption because it basically tells you that your problem is over - you've built an architecture to manage data so now everything will flow smoothly. A data science project is not an end in and of itself, argues Marella. The end goal is putting those insights into play, to actually create value. The stakeholder and business decision needs to translate the insights coming from your data science teams into business wins.
Viacom are on a 6/7 year sprint with the eventual goal of developing a complete in-house data platform, combining data science, machine learning, and AI. They conducted a 6 month study into why certain insights failed to translate into certain goals, and why others were immediately activated and became a huge success. They found out that teams where the business stakeholder had actually invested time into what the data science teams were doing, including going through the motions of what the data science team does, had a far higher success rate.
In Marella's experience, ceding control of the decision-making process to your algorithms is very uncomfortable for most people, especially C-Suite executives. A data science initiative comes with certain risks, but so does not using the insights. Unfortunately, the risks of using the insights are more intangible than the known risks most C-suite executives have worked with over the course of their careers. If your business decision makers are not fully calibrated to understand the risk that they are taking - to validate the risk, to calculate what that risk is, and to understand exactly what it is the machine learning algorithm is telling them - it is very difficult for them to correctly weigh the benefits of any particular insight relative to worked for them in the past.
To do this, stakeholders have to be able to trust and understand machine learning to transform insights into business decisions. This needs to be a healthy trust, argues Marella. It does not mean abandoning their brains and blindly following the machine. Rather, they must recalibrate. They must self-assess their understanding of the risk of going the usual route versus the risk they will incur if they go with what the machine learning algorithm says.
The onus is not, however, all on them. A data science team also needs to develop a deep understanding of the business, to understand the levers that you can or cannot pull to achieve a certain goal tactically or strategically, or they may produce some flash looking graphics, but they will ultimately be irrelevant.
Neither of these are magically obtained. They require a structured approach. In order to do this, Viacom developed a cross-discipline training and cross-team learning program. This allowed for stakeholders and data teams to come together in sandbox projects where they could not only learn how each team can compliment the other, but also enabled them to carry out projects that are not necessarily going out to market using internal datasets. This allowed the stakeholder to actually visualize and see what is possible if they have a machine learning team look at a dataset and the impact it can have.
Viacom's cross-function team gave data scientists access to the business knowledge of thought leaders within the company, and a greater understanding of the logic around how the business decisions that they carried into their work. Marella found that data scientists were also interested in teaching stakeholders basic exposure in data science - the math and tech behind it. They essentially provided a statistical bootcamp, and the response from Viacom employees was positive, with employees across the organization looking to implement a similar thing. Key to this, Viacom found, was to make sure to look at solutions that worked within the business user's usual work flow. For example, in most businesses, people use a lot of Excel. Marella and his team found some amazing Excel packages that will do all the basic machine learning that you want - regression, classification, NLP and so forth. What allowed that to become the de facto for the initiative was that everybody had Excel and they didn't have to learn a new program or a new language.
None of this happened overnight. Data scientists and decision makers had to be trained to work together to come up with an overview of any said project or goal that their particular team has. They had to come up with feasible measurable, specific, and time-bound goals with specific linked KPIs. However, even just by going through the motions, asking questions of each other, and finding common ground, Marella found that they were able to write models against KPIs and build a data initiative that truly created value.
You can hear more from industry leaders like Sashi at the Big Data Innovation Summit in London, taking place this March 21-22. To view the full agenda, click here
WATCH SASHI'S PRESENTATION IN FULL BELOW: