"Google Duplex Is The Most Incredible, Terrifying Thing Out Of #Io18 So Far"

Google unveils is hyper-realistic new AI - Duplex


At their I/O developer conference this May, Sundar Pichai, Google's CEO, unveiled the company's latest project, Duplex. And it didn't disappoint.

Duplex is Google's new AI system, designed to carry out "real world" tasks. However, unlike the existing Google assistant or similar offerings from Amazon and Apple in the virtual assistant market, Duplex is a little different. The video below illuminates in a way mere words cannot.

Suffice it to say, the internet reacted:

Yaniv Leviathan, Principal Engineer, and Yossi Matias, Vice President of Engineering at Google, explain the tech in a post on Google AI Blog. Leviathan points out that "a long-standing goal of human-computer interaction has been to enable people to have a natural conversation with computers, as they would with each other." Complete with 'um's' and 'ahh's', with tonal differences in responses and not to mention a seemingly seamless understanding of requests, it would appear Google has managed to sail past another milestone on its way to finally conquering the Turing test.

However, the recordings shared at the conference weren't live and were no doubt cherry-picked from a selection of trial phone calls. This is because, as Leviathan explains, "Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations". According to a release, Duplex is being trained via "anonymized phone conversations". This is where Google's overwhelming access to data comes in handy.

Nonetheless, there are a number of issues which come along with verbal conversations with people. These issues are further compounded if the person is unaware that they are talking to a machine. "Natural language is hard to understand," Mathias explains. "Natural behavior is tricky to model, latency expectations require fast processing, and generating natural sounding speech, with the appropriate intonations, is difficult."

This is why it is only capable of conducting phone calls after it's been deeply trained on that specific task. With all those difficulties that usually come along with "natural spontaneous speech", such as fast and unclear speech, phone calls add their own challenges to the problem, like background noise.

To get around this, the blog explains that Duplex's core is a "recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX). To obtain its high precision, we trained Duplex’s RNN on a corpus of anonymized phone conversation data. The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more."

Outside of simply being another way for Google to freak us out, there is a myriad of benefits this technology offers past the superficial. The AI has the potential to help people with both language and hearing difficulties deal with every day, phone-based tasks. For enterprise, it will be able to offer "delegated communication with service providers in an asynchronous way, e.g., requesting reservations during off-hours, or with limited connectivity."

However, there is still a lot of developing and testing that needs to be done. I don't think it's naive to say the tech is still years off where Google ultimately wants it to be. Nonetheless, Google's end goal of integrating AI seamlessly into every facet of our lives just got undeniably closer.


Read next:

Why We Need Data Visualization To Understand Unstructured Data