Using Hadoop In Small Companies

Is it the elephant in the room?


Hadoop is often the elephant in the room (excuse the pun). For many companies it is something that they should be aspiring to, get enough data to implement it to its full potential and then get the kind of insights that their relatively small amount of data currently cannot.

Therefore, they start to use more basic and less refined software in order to start off the process. The idea of implementing Hadoop may exist, but the point at which they do so is never reached. They have the basic idea that Hadoop is for big companies and that it is not worth a smaller company getting involved.

However, this is not the correct mindset to use.

The idea comes from blogs with title like ‘Don't use Hadoop - your data isn't that big’ which preach the importance of other software that can do roughly the same thing for smaller data sets. The problem with looking at the adoption of Hadoop in this way is simply that it makes it a scary and elitist platform, only useful if you are at a certain level. It is viewed in the same way as playing your first ever round of golf with a $10,000 set of golf clubs, why would you use them when you cannot fully get the benefits?

The truth is that an early adoption of Hadoop is not a bad thing, it is not a company acting like it is too big for its boots, in fact, I would argue that an early adoption is something that be immensely beneficial. Some, such as Chris Stucchio, claim that it is inferior to other platforms due to the lack of coding, but it is far and away the most used platform across the biggest companies.

This argument is also neglecting the fact that several companies, such as Hortonworks or Cloudera, are building on Hadoop as the foundation, allowing more complex algorithms to then be used in addition to the core functions. Moving from other platforms onto these may not be overly difficult, but would certainly throw up considerably more challenges than if Hadoop had been used initially.

Another argument that is often used is that smaller companies do not have the budget to employ data scientists who can effectively use Hadoop to find the more in-depth insights from your data. This may be true, but the real truth is that they are not going to be needed initially. The point at which you will need to look into the most complicated aspects of your data comes after the biggest changes have been made which are the changes that will have the biggest effect. The use of in-depth data science allows for incremental changes to be made that would be difficult to see with the data that is already present.

Juergen Urbanski from T-Systems puts it best in his post ‘Start Small, Grow Tall: Debunking Three Big Data Myths’ for wired, when he said:

‘Cheaper storage and data processing are the low-hanging fruit of Big Data adoption in the enterprise. Only once an organization has started to realize a sizable return from Hadoop does it makes sense to hire a data scientist to ask more sophisticated questions of the data. Very often, that will mean analyses across data from different sources, such as sentiment, clickstream, sensors, location-based, server logs and text data.’

An initial investment of $50,000 in Hadoop can pay huge rewards, pushing the idea of further investment. At around the $100,000 mark a company can fund a full blown rollout. These are not the millions of dollars spent every year on data initiatives by enterprises ($13.8 million according a IDG Enterprise study), although the average for SMEs who have implemented is $1.6m, which shows that they have faith in it a business process.

It is also becoming increasingly complex to define the size of a company, as the size of workforce certainly does not represent the success of a company or indeed to amount of data it will collect. For instance, Facebook has around 10,000 employees and has one of the world’s largest databases. Walmart has 2.2 million employees, but the amount of data they hold is considerably less. So at what point is a company big enough? There are no set targets before it could be deemed to be the right time. 

SMEs could easily hold considerably more data than some of the world’s largest companies and with the power of the internet, the amount of data held can increase significantly very quickly. Due to this, having easily scalable systems, like Hadoop, in place early means that companies are prepared.

So despite much of what is said about Hadoop around the internet, some of which is true, the benefits of an early adoption of Hadoop can have significant business impacts. Don’t let the naysayers put you off. 

Find out more about the uses of Hadoop at the Big Data Innovation Summit in Boston on September 9 & 10


Read next:

Working At The Boundaries Of Aesthetics And Inference