A voice synthesis company based in Dubai published a fictional podcast interview between Joe Rogan and Steve Jobs using realistic voices digitally cloned from both men. It takes place during the “first episode” of a purported podcast series called “Podcast.ai,” created by Play.ht, which sells voice synthesis services.
In the interview, you first hear a replication of Rogan’s voice created by voice cloning technology similar to that which we’ve covered before on Ars. Deep learning technology has allowed AI models to replicate distinctive voices with a high degree of accuracy, such as in the case of Darth Vader in Disney’s Obi-Wan Kenobi TV series.
To achieve the effect, someone must first train the AI model on existing samples of the voice that will be cloned. Rogan is a prime target for AI voice training by deep learning models because ample quantities of his isolated voice exist on his podcasts. In fact, The Verge covered a PR stunt by an AI company called Dessa synthesizing Rogan in 2019.
Where this instance of AI tomfoolery becomes more interesting is that Play.ht additionally roped in the voice of deceased Apple CEO Steve Jobs. His voice, while robotically choppy at times, recalls his Apple keynotes and All Things Digital interviews from the late 2000s. And Play.ht claims that the text of the interview was generated by AI as well, possibly from a large language model (LLM) similar to GPT-3.
“Transcripts are generated with fine-tuned language models,” writes Play.ht on the Podcast.ai website. “For example, the Steve Jobs episode was trained on his biography and all recordings of him we could find online so the AI could accurately bring him back to life.”
In keeping with its LLM roots, the 19-minute interview doesn’t make much sense. After a while, parts of the fictional interview begin to sound like conceptual mashups of common Jobs talking points, including aesthetics, revolutionary products, competitors such as Google, Microsoft, and Adobe, and the triumphs of the original Macintosh.
For example, during a section of the interview, fake Jobs delves into criticism of Microsoft that is very similar the what the real Jobs said in a famous 1995 interview for Triumph of the Nerds, but it’s not a carbon copy—and you can tell the voice is synthesized if you compare the two. “That’s the problem I’ve always had with Microsoft,” fake Jobs says. “In many ways they’re smart people and they’ve done good work, but they’ve never had any taste. They’ve never had any aesthetic sense.”
Whether it’s legal to use Jobs’ or Rogan’s vocal likenesses in this manner—particularly to promote a commercial product—remains to be seen. And despite the PR-stunt nature of the podcast, the concept of entirely fictional celebrity podcasts got our attention. As voice synthesis becomes more widespread and potentially undetectable, we’re looking at a future where media artifacts from any era will likely be completely fluid and malleable, shapeable to fit any narrative. In this particular fictional world, Jobs is a huge Rogan fan.
“It’s nice to sit back in the car and listen to you rant,” he says.
Discussion about this post