By John Gruber
Total Mac visibility for you and your users. Free for your first 10 Macs.
Let me just reiterate up front that my suspicions surrounding Google’s Duplex recordings are not suspicions regarding the idea of Duplex itself. If I had to bet on who will be the first to create an AI voice system that passes for human, even within the limited constraints of a single well-defined task like booking reservations, it would be Google. If Vegas had a betting line on this, Amazon would probably have decent odds too, but surely Google would be the favorite.
We can all hear for ourselves how well Google Assistant works today. I’m not alleging that these recordings are complete fabrications, or betting against Google being further ahead in this effort than anyone else.
But everything about the way Google announced this — the curious details of the calls released so far, the fact that no one in the media has been allowed to see an actual call happen live — makes me suspect that for one or more reasons, the current state of Duplex is less than what Sundar Pichai implied on stage. His words before the first recording was played: “What you’re going to hear is the Google Assistant actually calling a real salon to schedule an appointment for you. Let’s listen.” And after the second recording: “Again, that was a real call.”
You can parse those words precisely and argue that Pichai never said they were unscripted or un-coached, or that the recordings are unedited. But that’s like saying Bill Clinton was technically truthful with his “I did not have sexual relations with that woman” statement. The implication of Clinton’s statement was that he wasn’t involved sexually with his intern, and that wasn’t true. The implication of Pichai’s statement was that right now, today, Google has a version of Duplex in its lab that can call a real restaurant or hair salon and book a reservation and sound truly human while doing so. Not soon, today. Look at the news coverage from the announcement — Mashable, The Guardian, The Verge, The Evening Standard — all of those reports on Duplex’s announcement are written in the present tense, as though it’s something Google has working, as heard, with no or very minimal editing, today.
If a few months or more from now Google can demonstrate a real Duplex call, live, that wouldn’t disprove my suspicion that they can’t do it right now in May 2018 — even though Sundar Pichai clearly implied last week that they can. If I’m wrong — if stories come out in the next week or two from journalists granted behind-the-scenes access to listen to Duplex make live calls (and watch them be parsed correctly, creating calendar events and notifications of the reservation dates and times), and those calls sound every bit as realistically human as the recordings Google has released so far — my suspicion will be proven false. And I’d be delighted by that. Part of the reason I’m so focused on Duplex is that if it really works like it does in these recordings, it’s one of the most amazing advances in technology in years.
But Google hasn’t done that, and the more I think about it, and the longer Google stonewalls on press inquiries about Duplex, the more suspicious I get that they can’t. Even if Duplex still has a low success rate, it would be amazing if, say, half its calls worked as well and sounded as good as these recordings. That would be perfectly understandable for a technology still in development.
But Pichai also said “This will be rolling out in the coming weeks as an experiment.” On the one hand, that makes me feel like maybe I am off my rocker for being so skeptical. Why in the world would Pichai say that if they weren’t at a stage in internal testing where Duplex works as the recordings suggest? But on the other hand, if they are that close, why haven’t they invited anyone from the media to see Duplex in action?
They did invite Richard Nieva from CNet to a behind-the-scenes preview before I/O, but all he got to hear were recordings, too:
In a building called the Partnerplex on Google’s sprawling campus in Mountain View, California, I’ve been invited to hear a 51-second phone recording of someone making a dinner reservation. […]
As I listen to what sounds like a man and a woman talking, Google’s top executives for Assistant, the search giant’s digital helper, watch closely to gauge my reaction. They’re showing off the Assistant’s new tricks a few days before Google I/O, the company’s annual developer conference that starts Tuesday.
Turns out this particular trick is pretty wild.
That’s because Person 2, the one who sounds like a man, isn’t a person at all. It’s the Google Assistant.
Why not let Nieva hear it live? Why not let Nieva answer the phone and book the reservation himself, as though he works at the restaurant? If it’s “weeks” away from rolling out in a limited beta to the public, that should be possible.
The job of journalists is to verify these things, not just to take a company’s word for it. Here’s Om Malik, linking to Dan Primack’s Axios story on Google’s stonewalling:
“Google may well have created a lifelike voice assistant…Or it was partially staged. Or something else entirely. We just don’t know, because Google won’t answer the questions.” @danprimack doing what journalists are supposed to do. Verify and dig deeper!
Finally journalism starts asking obvious questions of tech.
Tech journalism has never asked basic questions like “how did you do this?”
Apple once used my software to demo their tech, which wasn’t ready.
Reporters refused to ask about this.
“How did you do this?” is a necessary question. But even broader, when you’re only shown a recording, the question is “How do we know this is real?”
Maybe Duplex, today, works just as well and sounds just as human as these recordings suggest. But maybe it doesn’t work as well as they claimed, or doesn’t sound so human,1 or takes pauses that were edited out of the clips they’ve released. We don’t know, because Google hasn’t allowed anyone to verify anything about it. It’s like a card trick where the magician, rather than an audience member, picks the card and shuffles the deck.
It’s the difference between, say, watching video of a purported self-driving car versus watching — or even better, riding as a passenger in — an actual self-driving car.
The headlines last week should have been along the lines of “Google Claims Assistant Can Make Human-Sounding Phone Calls”, not “Google Assistant Can Make Human-Sounding Phone Calls”. There’s a difference.
A recording is not a demo. You can demo hardware and software that isn’t shipping yet — most companies do, because that’s when the products are still under wraps and can make for a surprise. But there’s an obligation to be clear about the current state of the product, and to demo what you currently have working “for real”. Showing it privately to select members of the media is another acceptable strategy. Just to cite one famous example from Apple: in January 2007 the original iPhone was six months away from shipping and still needed a lot of work. But what Steve Jobs showed on stage was real — early stage software running on prototype hardware. Everything demoed was live, not a recording. And then to further prove that, after the keynote, select members of the media, including Jason Snell, Andy Ihnatko, and David Pogue, got up to 45 minutes of actual hands on time with a prototype, even though the software was at such an early stage that some of the default apps only showed screenshots of what they were supposed to look like.
That’s how you prove to the world that a demo was what you said it was. It is damn curious that Google won’t do that with Duplex. ★
Google now claims their plan all along has been to have Duplex identify itself to humans. I don’t understand how that squares with the efforts they clearly went through to make Duplex sound convincingly human. It seems clear that they only started thinking about disclosing Duplex as a bot to humans in response to the ethical outcry after the keynote. Ethics aside though, what makes the promise of Duplex so tantalizing as a technology is its seeming humanness. ↩︎
At the bottom of Google’s AI Blog announcement of Duplex (“An AI System for Accomplishing Real World Tasks Over the Phone”), they included a photo of two Duplex engineers eat a meal, with the following caption:
Yaniv Leviathan, Google Duplex lead, and Matan Kalman, engineering manager on the project, enjoying a meal booked through a call from Duplex.
As suspicions around this announcement deepen, I got to wondering if we could identify this restaurant. If we could identify the restaurant, we could ask them if they had been told in advance they would be speaking to Google Duplex, among other interesting questions.
The image is cropped somewhat tightly, but they’re clearly eating Chinese food, the bench style and wall color are distinctive, and there’s a large picture hanging over their heads. So, I did the laziest thing I could possibly do: I asked my Twitter followers if any of them recognized it.
22 minutes later, we had the answer from DF reader Jay P: Hong’s Gourmet, in Saratoga, CA. This image on Yelp shows the same bench, same wall, and same picture on the wall. Next door to Hong’s Gourmet is Masu Sushi, whose sign is legibly reflected in the glass of the picture behind the Google engineers.1
My thanks to Jay P and everyone else who contributed to the thread on Twitter. Jay deserves the credit for cracking this, by going backwards from the Masu Sushi sign in the reflection.2 All I did was ask. The fact that I had an answer to my question in just 22 minutes shows that having a large follower count on Twitter is a bit of a super power. I honestly can’t think of another way to answer this question without Google PR’s help. I suppose, without Twitter, I could have just posted the question on Daring Fireball, and I might have gotten the same answer. But the threaded, public, instant nature of Twitter allowed for multiple people to contribute — we went from “this might be the place” to “this is definitely the place” in just a handful of minutes. Remarkable, really. ★
One weird detail is that the image from Google of the engineers has been flipped horizontally, so the reflection of the neighboring restaurant’s sign isn’t mirrored. My only guess as to why Google flipped this image is that they wanted Leviathan, the project lead, to have his name listed first in the caption. ↩︎
After August 16th, 2018, “streaming services” at Twitter will be removed. This means two things for third-party apps:
- Push notifications will no longer arrive
- Timelines won’t refresh automatically
We are incredibly eager to update our apps. However, despite many requests for clarification and guidance, Twitter has not provided a way for us to recreate the lost functionality. We’ve been waiting for more than a year and have had one reprieve.
This antipathy to third-party clients is especially confounding considering that Twitter recently dropped support for their own native Mac client. As far as I’m aware, once this comes to pass next month, there will be no way to receive notifications of Twitter DMs on a Mac. None. (Twitter’s website doesn’t even support Safari’s desktop notification feature.) That’s just wacky.
Twitter management obviously wants to steer people to their first-party mobile app and desktop website. I get that. But they already have that: the overwhelming number of Twitter users use exactly those products to access the service. What Twitter management seems to be missing is that many of its most influential users — including yours truly, yes — have been on the platform a long time and have a high tendency to be among those who not just use, but depend upon third-party clients.
To me this is like finding out you’re now required to access email entirely through a web browser. Sure, lots of people already do it that way and either prefer it or think it’s eh, just fine, who cares — but a lot of others hate it and find it completely disruptive to longstanding workflows.
Twitter isn’t explicitly saying that they’re shutting down third-party clients, but I don’t know that it’s feasible for them to exist if they don’t have access to these APIs. It’s like breaking up with someone by being a jerk to them rather than telling them you’re breaking up.
I urge Twitter to reconsider this decision. Third-party clients account for a relatively small part of the Twitter ecosystem, but it’s an important one. Twitter may not care about a native Mac client, but the users of these apps, and the developers who make them, certainly do. ★