Today more than ever, every business is focusing on collecting the data and applying analytics to be competitive. Big Data Analytics has passed the hype stage and has become an essential part of business plans.
Data Lake is the latest buzzword for dumping every element of data you can find internally or externally. If you Google the term Data Lake, you will get almost 14 million results. Simply put, everyone is doing it.
The idea behind a Data Lake is to have one central platform to store and analyze every kind of data relevant to the enterprise. With digital transformation, the data generated every day has multiplied by several times and business are collecting this consumer data, Internet of Things data and other data for further analysis.
As storage has become cheaper, more data is being stored in its raw format in the hopes of finding nuggets of information, but eventually it becomes difficult. It is like using your smartphone to click photographs left, right and center, but when you want to show some specific photograph to someone it’s very difficult.
Data Lakes, if not maintained properly, have the potential to grow aimlessly, consuming all the budget. Some companies have their Data Lakes overflowing on-premise systems into the cloud.
Most data lakes lack governance, lack the tools and skills to handle large volumes of disparate data, and many lack a compelling business case. But, this water (the data) from your data lake has to be crystal clear and drinkable, else it will become a swamp.
Before getting onto the bandwagon of creating the data lake that may cost thousands of dollars and months to implement, you should start asking these questions:
What data do we want to store in Data Lake?
How much data to be stored?
How will we access this massive amounts of data and get value from it easily?
Here are some guidelines to avoid drowning in your Data Lakes:
First and foremost - create one or more business use cases that lay out exactly what will be done with the data that gets collected. With that exercise you will avoid dumping data, which is meaningless.
Determine the returns you want to get out of Data Lake. Developing a Data Lake is not a casual thing. You need good business benefits coming out of it.
Make sure your overall big data and analytics initiatives are designed to exploit the data lake fully and help achieve business goals.
Instead of getting into vendor traps and their buzzwords, focus on your needs, and determine the best way to get there.
Deliver the data to a wide audience to check and revert with feedback while creating value.
There are many cloud vendors to help you out building data lakes – Microsoft Azure, Amazon S3 etc.
By making data available to Data Scientists and anyone who needs it, for as long as they need it, Data Lakes are a powerful lever for innovation and disruption across industries.