Could Hadoop Be Your Next Warehouse?

Recent Gigaom research suggests it could be


According to research done by Gigaom before their doors closed in March 2015, George Gilbert found that there was another way for people to optimize their data warehousing capabilities.

At present people tend to use two main types, core data warehouses and data lakes.

The core data warehouse is the expensive technology that allows all of the data to be sorted in highly functional and useable ways. From here it can be fed through in totally structured formats. This allows for quick and relatively clean analysis, but is limited simply by the fact that it’s curated.

Data Lakes are far less structured and bring together data from disparate sources that can then be analyzed through Hadoop. This can include structured, unstructured and semi-structured data, taking much of the emphasis of this analysis from the core data warehouse. This allows the core data warehouse to perform operations across only the structured data it holds, rather than needing to work through the complexities of the data lake.

What Gilbert notes is that there is a third type of warehouse, the adjunct data warehouse.

Here the various data warehouse tasks can be offloaded from the conventional data warehouse to Hadoop. This also means that this should be used to perform ETL and reporting on the data gained from the data lake.

Essentially, it needs to act as a filter between the data lake from the various data sources before it can be incorporated in the core data warehouse. It does not require specific questions to be asked, unlike in the core data warehouse, meaning that it becomes possible to find more correlations in order for questions to be identified later.

This kind of work will allow for data scientists to gain unprecedented insight and powerful data before it is fed through into a more structured and therefore slightly limited way.

It is something that is not simple to implement, but in terms of money saving, is huge. The comparative saving from trying to perform these operations on traditional systems is considerably less. Although this is not currently something that many companies are doing, with the kind of benefits that this could bring, it is only a matter of time before this can become a more widespread practice. 


Read next:

Working At The Boundaries Of Aesthetics And Inference