Business users want the power of analytics—but analytics can only be as good as the data. To perform data discovery and exploration, use analytics to define desired business outcomes, and derive insights to help attain those outcomes, users need good, relevant data. Executives, managers, and other professionals are reaching for self-service technologies so they can be less reliant on IT and move into advanced analytics formerly limited to data scientists and statisticians. However, the biggest challenge nontechnical users are encountering is the same one that has been a steep challenge for data scientists: slow, di cult, and tedious data preparation.
Data preparation is a hot topic on both business and IT sides of organizations. It is also the focus of innovative software technology and methods aimed at accelerating, if not automating, processes necessary to support business analytics. Preparing, blending, integrating, cleansing, transforming, governing, and defining the metadata of multiple sources of data—including new, raw big data in Hadoop—has been primarily an IT job; however, broadening interest in data science and analytics has drawn non-IT personnel into the execution of these tasks. Non-IT users such as business and data analysts as well as developers are looking for smarter self-service tools that reduce difficulties and make data preparation processes faster. IT, meanwhile, is interested in tools that can streamline data preparation, improve productivity, and enable IT to serve users better.
This report examines experiences with data preparation, discusses goals and objectives, and looks at important technology trends reshaping data preparation processes.