I recently exchanged views with a colleague about Gartner’s new survey of 284 respondents on the growing pains of Hadoop. The report pointed out that over half of the respondents have no future plans to invest in Hadoop and only 18% have plans to invest in Hadoop over the next two years. A few news outlets including Fortune Magazine and ZDNet referenced the Gartner survey and talked about the hype around Hadoop and Big Data.
My colleague suggested that Hadoop is going through the trough of disillusionment. Having worked on both the vendor and customer sides, I think that’s a fair assessment because despite the slower than anticipated pace of adoption, the business drivers for taking informed, proactive, data-driven decisions have not gone away and the digital/software revolution across industries is generating more data than ever before.
So in this post, I wanted to share my views on why Hadoop, in particular, and Big Data in general is in the trough of disillusionment and – more importantly - what it takes to graduate to the “Slope of enlightenment”.
It is important to recognize that Hadoop is a technology and not a solution. Like any new technology, mass-market adoption in enterprise settings requires two things:
- The technology needs to mature beyond the functional capabilities into “addressing business needs”
- Enterprises need to understand what the technology actually delivers, if and when they need it and how to use it.
Let’s look at these in turn.
What exactly does it mean to “address business needs”? If you are a vendor, its natural to see it from the lens that is aligned with your position of strength. But from an enterprise user’s perspective, there are primarily three aspects to consider:
Leveraging existing skillsets and tools without a lot of retraining: There are two types of skillsets and tools: functional skillsets and tools such as SQL, OLAP, visualization, etc. Also equally important – and underserved in my view - skillsets and tools for configuring, managing and maintaining Hadoop clusters. The Big Data vendor ecosystem needs to ensure that both functional and operational assets can be reused by enterprises as they migrate to Big Data technologies.
Security and business continuity: While innovative vendors and startups are rapidly bringing products to market, addressing parts of this – not to mention ASF projects, the ecosystem of these products and projects has not matured beyond the tipping point. This is where the average enterprises is willing to entrust the bulk of their data to the Hadoop stack. We are talking about a long list of “enterprise-grade” capabilities: authentication, authorization, audit and governance, encryption, geo-redundancy, backup and restore, etc.
General agreement on the right tool for the right job: The ecosystem of both open source projects and proprietary products is moving too fast for the average enterprise to absorb. The choice of right tool for the right job seems to be a constantly moving target for the average enterprise. An extreme example of this is the significant overlap among open source tools, all part of the same software foundation to deliver SQL on Hadoop: Hive, Drill and Spark SQL. Enterprises are used to picking products from competing vendors, but not so from multiple rapidly evolving and overlapping open source projects under the same open source forum. Some level of consistency is needed within ASF to help enterprises choose the right open source tool for the right job.
Now, lets look at industry awareness of what the Hadoop technology actually delivers, when enterprises need it and how to use it.
I have seen the new technology adoption fallacy of “if you build it, they will come” play out several times over the past two decades. Translated to Big Data: if you build a data lake and proactively store all the data, the use cases will come along.
I think this fallacy plays an important role in the disillusionment. I have seen some enterprises, in the minority I must admit, successfully getting over this disillusionment by taking a line-of-business, use-case driven approach. For this to happen, some tough business level questions needed to be asked and answered:
- What are the business problems or opportunities that could be addressed by taking a data-driven approach?
- Are there proactive decisions we can take or personalization we can deliver to customers that have quantifiable business value?
- What internal and external data assets are needed to start driving this approach? How can they be acquired?
- Does data owned across multiple teams need to be consolidated as a pre-requisite to analytics?
- Once insights are acquired, when and how do we take action on them?
- How do we inform intuition-driven decision making with data-driven decision making?
- How do we handle constantly changing environmental assumptions that were used to generate insights?
Like any other investment decision, technology due diligence needs to succeed business due diligence. Now, lets look at the questions to be asked for technology due diligence for Hadoop:
- Is the volume of data large enough to justify investing in Big Data Technologies?
- Do we need to store untransformed data and post-process it later as required by the use case?
- Do we have data and analytics use-cases that don’t fit into pre-existing data warehouse data models or analytical solutions?
- Do we run into issues where the level of detail to implement use cases is missing because we aggregated the data before we stored it?
- Do we have use cases that require ingesting and processing massive streams of data in real time? How are we implementing these use cases today?
- What “enterprise grade” requirements need to be satisfied?
While progressive enterprises have gone through these motions, the majority of the enterprises are still struggling to connect the dots. This is because it takes more than just the availability of new data sources and new technologies to crunch the data they generate. It also takes cultural and organizational evolution in enterprises: to implement processes and best practices to go through the due diligence and to work across organizational boundaries, and these take more time than delivering, say the next parallel stream-processing engine on Hadoop.
In the mean time, the vendor ecosystem needs to mature and vendors need to drive their offerings to address business needs. The Gartner report has made it clear that this is a prerequisite for the tipping point in mass-market enterprise adoption of Hadoop in particular and Big Data technologies in general.