9 Comments

I have to say, I am a bit surprised by this strong statement. There are a few reasons why I think voice actors should not be worried that much. On one hand, yes, we already see companies replacing voice actors with AI. On the other, though:

1. This AI hype seems to be overrated. Companies are sharing carefully curated samples, that do not always reflect real usage of said voices.

2. I work with such voices, provided by one of the companies mentioned. They sound amazing, yes. But they hallucinate. Yes, just like GenAI because these voices are also less rule-based, and more LLM/AI-based. I had a few instances of a text being read out correctly four times, and sounding like a possessed monkey the fifth time.

3. There is not much room for manipulation. At least the solution I work with is not subject to SSML, meaning I cannot even change the speed or pitch. Introducing this feature isn't a big deal, but there is so much more. I need to adjust not only the voice altogether, but way more often - specific words. Some need to sound more serious, some cheerful. I think Azure provides some of these features for EN, but I haven't seen this working for any other language, and it seems it's even less a thing with these AI-based voices.

Altogether, I wouldn't worry if I were a voice actor. I mean, one can spend ages trying to tweak a text and all possible params to have the AI read it out nicely, but ultimately the easy solution is to just hire a voice actor, especially if it's not a neutral text but one that needs to convey emotions, and even more so if they keep changing throughout a text.

Expand full comment

Thanks for your lengthy response, Finka! Greatly appreciate you sharing your thoughts. You make some good points. While it's true that not all tools offer the option to manipulate generated samples after the fact, these tools do exists. On top of that, some synthetic voices providers offer voices that are so good that I believe it removes the need for any post-generation manipulation. Strong examples of that would be the voices of ChatGPT (the app) and Pi (from Inflection.ai).

While ENG is the dominant language with the highest performance, right now, the most spoken languages in the world will soon follow, as there is more than enough data available to train such voices.

Expand full comment

As someone who follows the audiobook and podcasting industries closely, I've noticed a bit of a misunderstanding among many AI enthusiasts about what voice acting and audiobook narration actually are. Actors and narrators' primary job is to vocally interpret texts, not to convey information.

Actors make choices that are grounded in their understanding of the story and the author's/characters' motivations. Then they use their creativity and nuanced control of their vocal instrument to record a unique performance. AI's current ability to do this is--depending on your perspective--either zero or very primitive.

If verisimilitude were the only barrier to entry to VO, literally any human could be a successful voice actor. If conveying the information in the text fluidly were the sole purpose of or audiobook narration, Audible would not have separate ratings for the story and the performance.

Yes, for many utilitarian functions, AI will be able to do the job, but that's long been the case with text-to-speech. When it comes to replacing talented professionals in high-quality podcasts, audiobooks, TV shows and commercials, and video games, we are nowhere near that.

It's telling that those who proclaim the death of VO tend to be very knowledgeable about AI, but know very little about the industries they claim it will kill off.

Expand full comment

Thanks for you response, Mack. I appreciate you sharing your views. I'm aware that voice acting is a serious profession that requires skill, talent and a lot of practice. I'm also in full agreement with you that AI is cannot yet match the creativity and nuanced control of a real voice actor... however, I invite you to watch to follow fragment of a speech of Stephen Fry did on CogX last year: https://youtu.be/zZfS8uk70Zc?feature=shared&t=749

The question that I'd like everyone to reflect on is: at what point will this small decline in quality be is so neglible that cost will be the defining factor? I think we're not far away from that future, given the rate of progress, which means a lot of people stand to lose their jobs as voice actors and only extremely high-end productions will go the extra mile and pay the premium of working with real human beings. It's sad, but pretty much inevitable, if you ask me.

Expand full comment

Hi Jurgen, really enjoying your Substack! I completely agree that the technology is fascinating and incredible. I do think it will cut into things like transportation announcements and game NPCs--and we all know it's widespread in online ads and videos.

But actors and narrators make interesting and unexpected choices and they inspire parasocial relationships. Think of how many differentiated character voices Fry had to invent to narrate the Potter series--and how dull that AI VO (trained on those performances) was in comparison. People are going to still want the talent and charisma of their favorite human performers. Read through the Audible reviews and see how passionate consumers are about their favorite (and most hated) narrators. That's not going away.

People said Vocaloid would replace human singers and "synthespians" replace actors. It hasn't happened and it won't. Look at all the crappy and amateurish TikTok creators that people love because they love the human connection. I agree that AI will create lots of content that people will happily consume, but it won't replace entire industries of human performers.

Expand full comment

I agree that OpenAI's ChatGPT voices are pretty darn good, they are perhaps my favorite. However, the Read Aloud feature of ChatGPT is very unreliable. It will work great one time, and then crash 8 times in a row.

Just yesterday I opened an ElevenLabs account because using ChatGPT's read aloud feature was becoming a huge time waster. I'll be paying a little more for ElevenLabs than I pay for ChatGPT. If OpenAI could make their Read Aloud feature reliable, they would add considerable value to the product and could certainly raise the price.

I'm not sure exactly what the point is of adding clearly broken features to such a popular product as ChatGPT. Perhaps they reasoned something is better than nothing? The price tag is a hit to their brand though.

Expand full comment

Of course I feel said for the people that are losing their gigs and/or even their jobs. And at the same time, they might have replaced others as well (so we shouldn't feel too said about it). It would be interesting to see to what extend AI will replace real actors... maybe we get to keep people who really stick out. Those who have nothing remarkable will be replaced

Expand full comment

I think actors, in movies and series, are much safer than those who just lend their voice. As people we like to watch people. We just do. So unless AI-generated movies becomes so good and compelling that they outperform world-class performances by real actors, there will continue be a long and vibrant future for movie actors.

Expand full comment

Well, what about video games? I'm not a gamer, but my vague understanding is that games are pretty huge.

Expand full comment