The Art of (Big) Data Preparation

Preparation is key - even for your data


Data Preparation

If you decide to embark on the Big Data bandwagon, which you probably in some form or fashion soon will, your analysts will spend considerable amounts of time on data preparation, which can be both mundane and time-consuming: indeed, some experts say that the data prep portion of Big Data initiative actually takes about 50-80% of your analyst’s time.

In recent years and especially now, there is a dedicated category of solutions within the Big Data/Analytics realm emerging, their primary mission and value proposition being aimed at best practices when dealing with data in their raw form and transforming the data step-by-step into a final stage, which could be a dashboard on your smartphone for example.

However, there are a quite a few steps in between (data ingestion/exploration/enrichment/cleaning/profiling, etc.), which you want to get right, considering your ultimate objectives of business insights, effective data sharing between stakeholders, which could ultimately lead to new product launches, new cost reduction areas identified or a successful retail marketing campaign delivered.

As suggested, there are some interesting solutions in the Big Data marketplace specifically tackling these challenges, performing under the names of Data Wrangling, www.trifacta.comAdaptive Data Preparation, or Data Unification,, for that matter. Oracle is also getting involved:, and they call it Big Data Preparation Cloud Service.

Famous data scientist DJ Patil came up with a related concept of Data Jujitsu,, an interesting term, carrying some kinaesthetic elements even.

Clearly, Data Preparation approaches can only gain in relevance considering the increasing complexity of Big Data environments, characterized by volume, variety, velocity, veracity, validity and even volatility.

In order to drive business and stakeholder value, current concepts above all have to be set in your specific business context, company objectives, challenges and resources that you already have or intend to invest in. We like to share best practices and your opinion is valued.

Bruno Polach


Read next:

Working At The Boundaries Of Aesthetics And Inference