It’s Game Over on Vocal Deepfakes

You may recall back in October I linked to an AI-generated simulated interview between Joe Rogan and Steve Jobs. I wrote:

I also don’t buy their claim that these voices are completely generated. Most of Jobs’s lines have auditorium echo — they sound like clips copy-and-pasted. If they can really generate these voices, why doesn’t their virtual Rogan actually say Steve Jobs’s name? Send me a clip of virtual Steve Jobs saying “John Gruber is a bozo, and I tell people not to waste their time reading Daring Fireball.” Then I’ll believe it.

I neglected to follow up until now, but Ignaz Kowalczuk from ElevenLabs (the company behind Prime Voice AI) took me up on the challenge and sent me this clip:

That clip sounds noticeably stilted, but it does sound like Steve Jobs.

Now come this: a Twitter thread from John Meyer, who trained a clone of Jobs’s voice and then hooked it up to ChatGPT to generate the words. The clips he posted to Twitter are freakishly uncanny. It really sounds like Jobs. The only hitch is that it sounds like Jobs reading from a script, not speaking extemporaneously. But damned if it doesn’t sound like him.

It’s all fun and games in these demos, but this is inevitably going to be put to use by ratfuckers to create fake scandals in political campaigns. Recall the infamous “When you’re a star, they let you do it. Grab them by the pussy” Access Hollywood tape that The Washington Post published in October 2016. That tape obviously didn’t prevent him from winning the election, but it did hurt him by a few percent in the polls. There was no question at the time that the tape was legitimate. But if it came out now?

And it feels inevitable that a Roger Stone or Steve Bannon type will use this technology to commission, say, a recording of Joe Biden forgetting his own name or what year it is, or Kamala Harris claiming to be running an abortion clinic in the Eisenhower Executive Office Building or admitting to the existence of a Democrat-run sex-trafficking pedophile ring. A dangerous chunk of wingnuts bought into such a conspiracy in 2016 without compelling deepfake forgeries.

Real recordings will be called fake and fake recordings will be leaked as purportedly real. I don’t think the general population is prepared for this, and I worry that news media organizations aren’t either.