DATAx insights: The role of unstructured data in AI

Jayant Lakshmikanthan, CEO and founder of CLARA analytics, dives into the importance of unstructured data when training of AI systems

14Mar

The process of making AI systems interact more like humans makes some people uncomfortable, but AI is not about replacing humans. In reality, it is much more about removing the robot from humans. A big part of AI's value lies in automating manual processes and analyzing vast amounts of data quickly so that humans are free to accomplish higher-order tasks that require reason and judgment. To get to this point, however, AI systems must be able to communicate with users and analyze natural forms of data (aka: Unstructured data) — all of the free-flowing stuff that is unable to be packaged in a neat way, things like voice, images and text.

Unstructured data is vital to the development of an AI system. The better an AI system communicates with users, the more it can learn on its own and, therefore, the more efficient it will be. This is important because if an AI system requires a user to interact only in a structured format, its components are dramatically limited. For AI to be successful, it has to make sense out of messy information.

In this context, let's dive deeper into how unstructured data comes into play.

The challenges of unstructured data

In the human world, you and I do not speak by protocol when we carry on a conversation. We say whatever pops into our heads, in some configuration that may or may not follow convention. We use slang, incorporate sarcasm and crack jokes. It is not natural for us to organize our everyday language and the information we wish to convey into neat columns and rows. Speech is natively unstructured.

If you've ever interacted with Amazon's Alexa, you know that, while the Echo system has generally become quite proficient at understanding free-form commands, the lack of a defined protocol can sometimes cause problems — or at least humorous responses when Alexa attempts to answer queries that don't fit the mold. Amazon has poured massive resources and millions of dollars into creating and perpetually refining the algorithms that enable this humanlike voice to respond to commands, but as adept as Echo has become at deciphering free-flowing language, Alexa still has flaws.

The Alexa example highlights the complexity of one type of unstructured data. An AI system's ability to process and create a numerical equivalent to text is also a tall order, especially when you consider nuance and the importance of context. And imagine a machine trying to 'understand' what is happening in that picture from your family vacation or an image in an art history textbook covering Impressionism.

The complications associated with processing unstructured data are perhaps the biggest obstacles for AI in the enterprise. Yet, they are not insurmountable.

The importance of expertise

Unstructured data is inherently noisy. As such, it requires substantial expertise to cut through, tease out and detect patterns, and then develop models that recognize those patterns. Data scientists are pushing aggressively to improve AI systems, and the biggest successes underscore that human instinct and experience are required. This usually happens when a team is focused on a very narrow application of AI.

Let's take the workers' compensation claims process as an example. Teams of data scientists with a deep knowledge of claims can create predictive models based on key indicators they spot. They incorporate unstructured data such as diagnostics, drug information, claim notes and more. In doing so, the AI system assesses early indicators and determines that a certain claim might be denied. It can then provide an alert to users. A claims representative can figure out how to intervene and give a particular claim more care to prevent the claimant's attorney from getting involved (typically denied claims wind up involving an attorney, which gets very expensive and takes a long time to resolve).

In this case, it is easy to see how the AI system provides assistance to its users, and there is also a tremendous boost in accuracy when that unstructured data is incorporated versus relying on structural data alone. There is a gold mine of information and insight in the unstructured data (e.g. information about comorbidities) that just doesn't find its way into structured data consistently. With each additional piece of information, the AI system gets smarter and results improve. This translates to greater efficiency and lower claims costs.

This is just one example of one benefit from incorporating unstructured data into an enterprise AI system. It takes time and diligence to crack the code, but the payoff is gaining a level of insight that has never been possible before – and getting it in a matter of minutes or hours compared to days or weeks.

Unstructured data is the key

Moving forward, it's plain to see that every AI system needs to interact with users in a natural way. Organizations must have a sharp focus on this. In fact, there is a huge gap in a company's offering if unstructured data analysis is not part of the roadmap.

While unstructured data is challenging, Amazon, Google, Apple and others have opened a lot of opportunities for AI applications. We can take these advances and apply them to enterprise applications where they have an enormous business impact.

By taking the time to apply expertise and sound data science, we can make big breakthroughs. We will not only improve accuracy in data analysis through unstructured data but also achieve fundamentally new ways of thinking, communicating, and utilizing information in the future.

Rideshare image

Read next:

​Are we overestimating the benefits of predictive analytics in the ridesharing industry?

i