Data lakes are the latest big thing in data manipulation and accurate analysis. They have been sold as a revolution in the way that people access and utilize data, but there have also been people who question the real values of them. If you want to know exactly what they are, we recently explained them.
Many blogs have been written claiming that data lakes are just another marketing term used to try and sell more business solutions and that in order to use them there needs to be in-depth analytical knowledge in every department. Regardless of the validity of these claims, it is doing a disservice to the benefits and potential positives that they hold for many companies.
So what are they?
The ability to keep huge amounts of data in a data lake for considerably less than a managed enterprise data warehouse, is one of the biggest benefits that companies cite. When looking at solutions, storage costs are one of the primary concerns that needs to be considered and given the comparative unstructured nature and variety of data that can be stored without prior processing, it means that data lakes offer sound financial value.
However, it is worth noting that despite being cheaper than traditional managed enterprise data warehouses, it still needs to have some form of formal organization when it is being processed and analyzed. This means that a more traditional database is required to make sense of much of the data held in the data lake, but rather than processing the data before it enters, it is processed as it leaves. Therefore only the data that is going to be analyzed needs to go through this step.
No Need To Know Exactly What You Need
Given the cheap storage costs and huge variation in data type that can be included in a data lake, it gives companies the chance to legitimately horde their data. It means that if a particular data set has no real impact now, it may well have a significant impact in the future and can therefore be stored almost indefinitely.
Given that data lakes allow the data to be fed through in a native format, it can be stored and added to over time, making datasets more complete and useful for future analysis. This means that down the line when the data may have more prevalence or relevance to another dataset, it can be found in a complete, unprocessed form .
Allows For Future Tech Changes
Given the pace of change within data technology, the ability to store this information in its native format before being imported into a more structured and controlled database, means that it will be easy to use in future systems. As we have mentioned, it moves the necessity of processing from before it is stored, to before it is analyzed, which makes it much quicker to pull through the necessary data to any required system in the future.
Whereas it can be both expensive and time consuming to transfer data to new formats and make slight changes to previously formatted data, through holding it in original and unchanged formats, it negates this cost when moving from legacy systems.
Given the scale and accessibility of data being held in data lakes, it has a clear advantage for sharing across enterprises. This will be particularly useful in the future as more teams and individuals will have the skills necessary to make in-depth analyses.
For instance if this were to be used in a hospital, there could be correlations between certain types of dermatology conditions as early indicators of other conditions. Having the ability to access this information and compare across an entire organization can have profound effects on outcomes. When you also consider that many departments will use different analysis software, having the data in raw form makes it far easier to use on a multitude of systems.