Data has become the fuel the economy is run on, and its security and storage are becoming an increasingly more integral part of any enterprise's infrastructure. Understanding the pros and cons of each data storage option and any potential benefits which can be derived from each is immensely important to the future success of any firm.
In order to explore the various industrial data storage options out there and find out what firms can do to better position themselves for 2019, DATAx turned to Manoj Vig, head of the clinical data repository and clinical data lake at IQVIA and speaker at DATAx New York. Below is the first of a two-part series of conversations about all things storage.
DATAx: What differences do people need to be aware of when it comes to data lakes and data repositories?
Manoj Vig: There are many different variations to these terms: Data lakes, data warehouses and data repositories to name a few. Sometimes their functionalities overlap with each other and sometimes they complement each other.
Modern data lakes offer unlimited data storage capabilities and are able to support a variety of data formats such as structured and unstructured data, text, images and genomic data. Data lakes store data in a secure and compliant way, with sufficient replication and failover support.
Data lakes also enable its consumers to model their data according to their needs allowing data to support individual use case needs. For example, one decision-maker may want to process a dataset and visualize it in a dashboard, while another may want to run a machine-learning algorithm on the same dataset. A third may want to include a search function on the same dataset using natural language processing (NLP). These are all possible simultaneously using data lakes as it allows you to decouple data and model and reuse the same data in myriad ways.
A data repository, on the other hand, is more like a data product which exists on top of a data lake which processes data in specific ways, integrates it into standard models and presents the data to consumers through different channels. It hides all the complexities involved in data processing, data quality and integration from the consumer and allows decision-makers to use the processed data for their needs.
In my view, a data repository is a collection of data capabilities, many of which are self-serve, thereby allowing various kinds of users to leverage data to make impactful decisions. This supports human users in the form of reports, dashboards and alerts.
At the end of the day, the difference between these concepts should be measured by how they complement each other and how they help us make better and faster business decisions.
DATAx: What technical improvements do you see disrupting the way we store data in 2019?
MV: I think the more important factor will be how we use stored data. These include the reusability of data, how fast actual decision-makers can access stored data, how much value we can get out of stored data before its value diminishes and how data in motion can be used to improve the healthcare ecosystem.
In my view, 2019 will bear witness to a massive shift in data storage and computing strategies which will be driven by cloud systems such as AWS and Azure.
Not only do these platforms provide data storage capabilities which are limitless when it comes the amount of data they can store, which will continue to be a very important factor for many businesses and use cases, but they also provide several pre-engineered turnkey capabilities that are hard to replicate in local data centers without significant investments.
A cloud platform's capability to store petabytes of data in a geographically replicated fashion with many different data centers across a number of regions is just amazing. On top of that, these capabilities offer pre-engineered compliance and regulation abilities, tailored to specific countries and industries.
This allows organizations to launch their data and analytics systems in various geolocations quickly and cheaply, providing early data and analytics access to decision-makers across the planet, fostering better collaboration, information sharing and collective decision-making, which will be a game changer in near future in all kinds of businesses.
There are many different ways we can define the benefits of cloud services and it would take a long time to discuss all of them. However, if I had to choose one specific trend that would most likely change how we store, distribute and compute data, I would go with cloud platforms.
Manoj Vig will be on a panel on Day Two of the AI & Big Data for Pharma Summit, part of DATAx New York, taking place on December 12–13 at the Hilton Midtown. To attend and hear more great insights from other data experts from some of the biggest and most influential organizations, register here today before it's too late.