Hadoop is an open source framework which is used to store and process big-data. The framework is developed in java for distributed storage and processing of high amount of data.
It is scalable as it is able to store large amounts of distributed data across a number of servers that operate in parallel. Traditional database systems cannot process such a large amount of data. But, Hadoop enables a business to run applications that requires thousands of Terabytes of data.
Conventional database management systems were extremely cost prohibitive for processing high volumes of data. Previously, to reduce the costs, many companies used to segregate the data according to certain assumptions and then process it. Many of the companies then stopped keeping track of raw data as it was too costly to keep up with, and hence they used to delete it. But this approach worked for a short time only. As the change in business priorities led them to keeping hold of their raw data.
And so, Hadoop is designed in such a way that it is able to store all the data, even for later use. Instead of costing huge amounts, Hadoop offers it at considerably less money.
Hadoop enables a business to access new data sources easily and switch from one data set to another. Businesses can store it as structured and unstructured data too. Hadoop derives valuable business insights from sources like email conversations, social media or click stream data. Hadoop can also be used to log processing, data warehousing, recommendation systems, fraud detection and market campaign analysis.
Hadoop's storage method is based on a distributed file system, that can map data located on other clusters. The tools used for processing data are on the same server wherever the data is located. This eventually leads to much faster data processing. If you are working on large amounts of unstructured data, Hadoop has the capability to process petabytes of data in hours and terabytes of data in minutes.
Having resilience to failure
The key advantage of Hadoop is that it is fault resistant. When data is sent to another node, the data is also duplicated to other nodes on the same cluster. It means that, in case of any failure, there is another copy of the data always available for use.
When handling large databases in a safe and cost effective manner, Hadoop has an advantage over RDBMS (Relational Database Management System) and values for all business sizes.