Ahead of his presentation at the Data Visualization Summit in San Francisco on April 19&20, we spoke to Abon Chaudhuri, Sr. Applied Researcher, at Walmart Labs.
Abon develops machine learning driven solutions for large-scale classification and prediction problems in the e-commerce domain. Abon migrated to machine learning from a strong visualization and data analysis background. He received his PhD degree in Computer Science and Engineering from The Ohio State University, USA in 2013. His doctoral research focused on feature-driven summarization of big data to enable rapid query and visualization.
What has been the single biggest change in data visualizations today and 5 years ago?
The last five years mark data visualization's journey from a niche research-oriented field to a widespread technology solution available to data scientists and users across all fields. When I was a graduate student, visualizing a dataset (especially a 3D or higher dimensional or domain-specific one) would often require me to write lengthy code to implement the rendering algorithms from scratch. I would feel lucky if I could find a library that fits my need right out of the box. Thankfully, that is not the case anymore. Following the trendsetting release of d3.js, most data visualization APIs have become web-based and publicly available. Most complex visualization algorithms are now packaged as services or open-source libraries. New techniques can be coded up very easily using these libraries. Support for almost every data format is available. This change is important because it took place in a time when everybody owns data and is looking for tools to understand it.
Which piece of emerging technology do you think is going to most impact data visualization?
Recent progress in human-computer interaction (HCI) (with the support of computer vision and virtual reality) indicates that it is time to transform the way a user experiences a visualization. Multi-touch gestures, touchless interactions etc. are almost commonplace now. We are probably not far from a time when a person can actually walk into the projection of a 3D visualization (a holograph) and modify it.
We must also not forget the role AI can play in transforming this field. Wouldn't it be awesome if I only had to give the data and my question(s) to a program and let it choose the most effective visualization for my goal? Early work along this line is being published already.
Do you think you need to understand data to effectively visualize it?
Absolutely. An effective visualization results from a great deal of curiosity and exploration. The data we come across is often noisy, sparse, biased, incomplete, or irrelevant. Each of these problems should be investigated before finalizing the visualization. For example, if we have to visualize a 2D matrix, we may want to employ different techniques depending on if it is sparse or dense. When visualizing a graph of Facebook messages exchanged among users, we may want to encode each node with some additional information (such as the average number of messages a user sends per day). Now, such information may not be readily available in the data. But if we explore thoroughly, we may find a way to compute it from the data.
How do you see data visualization developing from a business perspective in the next 5 years?
No matter how the tech industry and its consumers evolve in the next 5 years, their collective fondness for generating and exploring data is not going to wither. Hence, opportunities for data visualization will continue to grow. The demand for visualization from a business perspective is strongly felt when new technologies in related fields gain popularity. The advent of medical imaging technologies (CT, MRI etc.) in the early 90s gave birth to several R&D companies offering medical and scientific visualization solutions. Such possibilities are abundant in today's world. For example, in the next few years, Internet of Things will make it possible to connect to and collect data from any device anytime, giving rise to an unforeseen deluge of data. A solution that offers the ability to visualize such data quickly and effectively will attract many takers. On the other hand, AI and deep learning will proliferate to new domains. Needless to say, onboarding users will embrace tools capable of visualizing these complex deep learning engines and the data churned by them.
What can the audience expect to take away from your presentation in San Francisco?
My presentation will offer a unique view of the symbiosis between machine learning and visualization. I believe visualization can play a huge role in making the process of modeling more transparent and meaningful than it is currently. Using a real application from work, I will highlight how not-so-obvious insights can be mined by applying visual analytics to the right problems.