Coordinating the Many Tools of Big Data in Hadoop
The big data revolution is more than just terabytes or petabytes of data. It is also the application of new paradigms, languages, and tools to these data sets. This is a great strength of big data, but also a liability. These tools have different data models, different utilities for reading and writing data, and different frameworks for including user code. How can users in the same organization using different tools share data? How can user defined functions written for one tool be used by other tools? This talk will cover work in Apache HCatalog, Apache Pig, and Apache Hive projects that is being done to address these issues.