ProBeat: We can not get over how human Google Duplex sounds

Watch the above video. Then watch it again, but close your eyes. Listen carefully to the voice making a restaurant reservation.

Duplex – Google’s artificially intelligent chat agent that can arrange appointments over the phone – has started rolling out to a “small group” of Google Pixel phone owners in select cities (Atlanta, New York City, Phoenix, and San Francisco). For now, the feature only works in English, with some restaurants, and can not handle any other businesses that take appointments.

As news of the feature becoming slowly available has spread, a lot of debate has focused on whether it’s worth the effort. As many have pointed out, it seems faster to just call the restaurant yourself than to have to input all that is required into Google Assistant and wait for a confirmation. There are plenty of scenarios where this is useful, though &#821

1; if you have a speech impediment, social anxiety when making phone calls, in a location where you can not place a call, the restaurant is closed when you want to make the reservation, and so on.

I want to focus on the other hotly discussed part of the news: the Google Duplex voice.

If not, you can not tell me how humanly it sounds, though I’ve watched the video so many times that I’ve convinced myself you listen very closely, you will notice “mistakes” in how the Duplex AI speaks. I put mistakes in quotes because I’m not completely sure. Google wants the technology to perfectly mimic how a human assistant would conduct the conversation.

What Duplex actually says sounds extremely believable – especially the multiple thank-yous and the "ba- bye "at the end. But you can tell something is off if you pay attention to the pauses. They are a bit too long, especially at the beginning and at the end. At the start, a human might fill a gap like that with an umm or anhhh, out of respect for the person on the other side. At the end, it's clear Duplex is not going to hang up first (until it gets some sort of confirmation, anyway).

That’s what I’m calling “mistakes.” But I do not know if Google is striving for perfection. And frankly, I do not think it should be.

Getting a conversational AI's voice to not sound robotic makes sense – it's simply more comfortable and comfortable to talk to. But having it perfectly replicate what a human would do? That's simply too much of a good thing.

Disclosure and transparency

In this Duplex ad from earlier this year, here is how the voice introduced itself:

Hi! I’m the Google Assistant calling to make a reservation for a client. This automated call will be recorded.

Hi, I’m calling to make a reservation. In the call we recorded, the wording has changed slightly, removing the part that makes it crystal clear this is not a human calling:

for a client. I’m calling from Google, so the call may be recorded.

I’m sure Google is still iterating here – the wording will probably change a few more times. The team could actually be A / B testing multiple versions.

This is a double-edged sword. Google got tons of criticism after its initial Duplex demo in May – many were not amused that Google Assistant mimicked a human so well. In June, the company promised that Google Assistant with Duplex would first introduce itself.

This is a double-edged sword. If Duplex gets things wrong and screws up the conversation, it makes Google look bad. If Duplex tries too hard to act human, it comes off as creepy and … makes Google look bad.

While Duplex is a, the trick is to strike a perfect balance: accurate and intelligent. user-facing feature, currently exclusive to Pixel phones, it is ultimately businesses that interface with the conversational AI. That’s the part it can not screw up.

More videos to come

We may have recorded the first video of Duplex in action, but I suspect this is going to birth a whole genre of new content.

Duplex is going to mess up, and it will be hilarious. Duplex is going to make serious mistakes, and it will be about. Duplex is going to get things too right, and it will be scary.

But hey, at least the internet will document it with lots of videos.

ProBeat is a column in which Emil rants about whatever crosses him that week.

