The Clouds Are Forming Around Big Data

Big data is moving to the cloud and it's far from grey skies ahead


When we thought about huge amounts of data 10 years ago, the image that popped into most people’s heads was a huge server room filled with server stacks all flashing in unison. When we think about data today, we visualize something the size of a file on a computer that we can access from anywhere in the world.

It may be true that many companies still have on-premise systems, but as the security and clear benefits of the cloud are becoming clearer, we are seeing a huge surge towards adopting cloud storage as a more practical way to store data. In a survey from research firm Clutch, 90% of enterprises asked said they planned to maintain or increase their cloud spend for 2016. Of these, 42% were planning on increasing spend by between 11-30%, 14% between 31-50%, and 7% were looking to spend more than 50%.

However, until recently this changed little in the day-to-day running of most data analytics departments, after all, they just access their data through a slightly different path when on the computer. But the cloud has far more use than storage alone and there are several companies who are forging ahead with the Big Data As A Service (BDAAS) model.

The idea behind this is simple - get other people to analyze your data to find what you need. It is a response to both the increased popularity of cloud computing and the difficulty that many companies have in finding data scientists. Many companies such as Blue Data, Qubole, CSC, and most of the big players offer the service to varying degrees, from self-service through to fully managed packages. It has been successful too - Qubole led a $30 million series c funding round in January 2016 which put their valuation at around the $150 million mark - not bad for a company founded only 5 years previously. In fact cloud based ‘software-as-a-service’ spend, which made up 15% of total spend in 2015, is expected to his 35% by 2021, and given that many companies are now looking to adopt or grow their big data capabilities, it isn’t hard to see that this market offers huge potential for growth.

It is certainly an exciting area, but putting a specific definition on what it is is difficult.

There are several platforms where companies can upload their data, say what they want to know, and then have the 3rd party analyze the data and come back with the findings. Others provide solutions that allow non-data scientists to be able to perform relatively complex analysis through easy-to-use interfaces. Others still offer easily scalable and secure storage options. There are so many different tasks that can be performed that it becomes difficult to pin down exactly what BDaaS actually is.

One of the most useful elements is companies who offer managed data tools alongside their own unique data. For instance, IBM have utilized their ‘Analytics for Twitter’ service, which allows companies to access the data from Twitter and use IBM’s tools to analyze specific data within that dataset. This allows access to otherwise inaccessible data for many companies of all sizes, meaning that they can ‘do big data’ without necessarily having a huge amount of data themselves.

However, it may be hypocritical to claim that this is a bad thing when it is a growing industry, after all, the term big data is almost meaningless in itself and doesn’t come close to describing the tasks and disciplines within it. BDaaS may be something that many companies are aware of and many are actively using, but it is still a catch all term for a wide ranging set of tools.

The spread of BDaaS is going to significantly increase over the coming years as the scalability and open access it offer becomes better utilized. This is likely to see the disparate functions currently grouped under the BDaaS banner become more focussed. In turn, this will make these services far more useable and marketable as it will allows specific uses to become clear and competitors to emerge within each field. Companies will be able to easily compare storage only cloud systems, platforms or fully managed systems, rather than them being encompassed under a single catch-all term.

We are also going to see increased plugins and connectivity to allow layering of services, meaning storage from one provider, analysis software from another, and even having this analyzed by another company completely. This is something that has already begun to some extent, with Qubole already making marketing the fact that their platform can be run on top of AWS, Microsoft Azzure, Oracle, and Google Cloud Platform, allowing for layering to take place.

A significant move to BDaaS is also occurring because it offers access to cutting edge technology and techniques for a fraction of the cost that implementing the same system would cost on-premise. For example, Sinequa offer cognitive systems through the cloud and IBM launched their quantum computing as a service platform in May 2016. The cost of implementing both, especially in initial implementation stages, makes them prohibitive to most companies. A quantum computer from D-Wave, for instance, costs a minimum of $10 million for the unit alone, which then needs to be housed and kept in specific conditions, adding at least another $1m onto that cost. However, the ability to access this through the cloud allows any company who can afford a subscription cost to use this exciting new technology.

Work like this is where true value is going to come from using BDaaS, where it can offer resources that would be more or less impossible to get anywhere else. This could be technologies like quantum computing, data like Twitter search ability or even the skills needed to analyze their own data. One thing we know for sure is that this is an exciting time for BDaaS and the future is looking bright, even though the clouds are starting to gather.

Big data hype small

Read next:

Is Big Data Still Overhyped?