Significant Changes Impacting Analytics

How Data Science and Data Lakes are changing the world of analytics


For those of us who like to use the term 'I was working in big data before it was called big data', the new world order of analytics has some nuances and peculiarities that make it different from the old days of Business Intelligence (as it was called).

From a team perspective, there is this 'Data Scientist' person in a team. Lots of people these days call themselves data scientists – I will not endeavor to refute their claim, but I see this as a change from the old days. This person, often with a PhD, was not really there in the old days of analytics. We used to have the data modeler - basically, this is someone who worked with entity relationship mapping and data modeling techniques to define how data should be structured. Snowflake or star schema’s, dimensional tables, fact tables etc, this is the lingo that the modelers spoke. The data model was an essential part of an analytic project. In order to have your BI tool cross tab sales and customer, you had to have a data model that allowed sufficient links within data structures to allow this data traversing.

Now, there is a different paradigm at play, the data model, I believe, has become less significant; that in itself is a major shift! The data model was the crown jewel of analytic projects. Database platforms were optimized and tuned and many years of effort expended to make sure the 'query' perform, indexes, etc were defined. If the data model was bad, you were hosed and done. Your beautifully formatted report or dashboard was not worth much if you did not spend well on the data model.

The new world order in the analytic world, while still retaining nuances of the data warehouse, is different. Recently, in one of my engagements for my company Inference Analytics, I was interpreting a table structure sent by a client for my Data Scientist, and I found this process interesting – I asked myself, why was I interpreting table structures? I was explaining the relationship between two files, so I asked my data scientist. 'Dude, you are the data scientist, not me, why am I interpreting table structures here for you?' And he had an interesting response. Data Scientists do not really specialize in data modeling. They want data in one large file that could be stored in HDFS etc, the emphasis is on the science, and then applying Machine Learning techniques: ensembles, associations, clustering etc to make sense of the data and find hidden patterns. This was quite revealing to me. So here is how this changes things:

Data Warehouse Data Lake and Exploratory Environment
You had to define your end user requirements in detail, so the data model could be constructed around those requirements. The data itself reveals patterns through ML techniques that users are alerted on or made aware off, data requirements are minimal
Any changes in the user requirement could end up impacting the data model Focus is on analytics and discovery so there is barely a data model to change
Data models had to be tuned to perform well and return answers Platforms like Spark run the analytics by putting the data sets in memory, so manual tuning is not needed

I will caveat the above again, by saying that what we are talking about above is more of a Data Lakes analytic environment. Data warehouses still exist and they still require everything as they did before including data modelers, however, the data exploratory analytic environment is becoming a very important part of the analytic ecosystem, to say the least, and may in fact become as important, if not more important than the traditional data warehouse.

University lecture small

Read next:

How Are Higher Education Institutions Using Analytics?