Discussion about this post

User's avatar
Finka's avatar

I have to say, I am a bit surprised by this strong statement. There are a few reasons why I think voice actors should not be worried that much. On one hand, yes, we already see companies replacing voice actors with AI. On the other, though:

1. This AI hype seems to be overrated. Companies are sharing carefully curated samples, that do not always reflect real usage of said voices.

2. I work with such voices, provided by one of the companies mentioned. They sound amazing, yes. But they hallucinate. Yes, just like GenAI because these voices are also less rule-based, and more LLM/AI-based. I had a few instances of a text being read out correctly four times, and sounding like a possessed monkey the fifth time.

3. There is not much room for manipulation. At least the solution I work with is not subject to SSML, meaning I cannot even change the speed or pitch. Introducing this feature isn't a big deal, but there is so much more. I need to adjust not only the voice altogether, but way more often - specific words. Some need to sound more serious, some cheerful. I think Azure provides some of these features for EN, but I haven't seen this working for any other language, and it seems it's even less a thing with these AI-based voices.

Altogether, I wouldn't worry if I were a voice actor. I mean, one can spend ages trying to tweak a text and all possible params to have the AI read it out nicely, but ultimately the easy solution is to just hire a voice actor, especially if it's not a neutral text but one that needs to convey emotions, and even more so if they keep changing throughout a text.

Expand full comment
Mack Hagood's avatar

As someone who follows the audiobook and podcasting industries closely, I've noticed a bit of a misunderstanding among many AI enthusiasts about what voice acting and audiobook narration actually are. Actors and narrators' primary job is to vocally interpret texts, not to convey information.

Actors make choices that are grounded in their understanding of the story and the author's/characters' motivations. Then they use their creativity and nuanced control of their vocal instrument to record a unique performance. AI's current ability to do this is--depending on your perspective--either zero or very primitive.

If verisimilitude were the only barrier to entry to VO, literally any human could be a successful voice actor. If conveying the information in the text fluidly were the sole purpose of or audiobook narration, Audible would not have separate ratings for the story and the performance.

Yes, for many utilitarian functions, AI will be able to do the job, but that's long been the case with text-to-speech. When it comes to replacing talented professionals in high-quality podcasts, audiobooks, TV shows and commercials, and video games, we are nowhere near that.

It's telling that those who proclaim the death of VO tend to be very knowledgeable about AI, but know very little about the industries they claim it will kill off.

Expand full comment
7 more comments...

No posts