Key insights of today’s newsletter:
ALDI, the supermarket chain, developed its own AI voice replacing the longtime voice actor behind the ‘voice of Aldi’ with immediate effect.
The voice-over industry as a whole presents major opportunity for AI companies like ElevenLabs, Resemble AI and OpenAI, who are in the business of creating synthetic voices.
Given how good these AI voices have become and how cheap it is to generate them, voice acting is destined to become obsolete.
↓ Go deeper (5 min read)
ALDI, the German multinational family-owned supermarket chain, made a big public announcement last week. They developed their own custom Dutch-speaking AI voice, made up of the voices of 10 female Aldi-employees. If you’re interested what it sounds like, you can hear it here.
In doing so, the supermarket chain successfully automated away what used to be the job of Dutch actor Diederik Ebbinge, who had been ‘the voice of Aldi’ in the Netherlands for the past four years. This saves costs, because the retail chain no longer has to pay for the rights to use a real voice, according to a spokesperson from Aldi.
The news raises questions about the state of the voice acting industry — why would anyone hire a voice actor if the same work can be done by an AI, for a fraction of the cost? I would even go as far as to say: if you’re a voice actor, relying for a majority of your income on voice acting work, it’s probably time to start thinking about a different career, like, right now.
How good are these AI voices really?
You might be thinking I’m coming off way too strong here, that I might be overreacting. To those I say, respectfully, you don’t realize how good AI voices have become.
Listen to this sample:
This voice belongs to one of OpenAI’s text-to-speech voices and I think it’s worth pointing out how it not only carries a certain warmth, it also does these um’s and ah’s, which are so characteristic of human speech and really make the voice come alive. Vocal bursts, they’re called.
It turns out that much of what makes a voice sound natural, when speaking, can be replicated using deep learning.
Unlike previous techniques for mimicking speech, deep learning models are trained on large amounts of speech data, similar to how LLMs are trained. These models learn the nuances of human speech, which allows them to generate human-like voice samples from text input: a process that takes no more than a few seconds.
In the case of Aldi, they’ve decided to use a slightly different application of the same technology, where a recording of someone speaking is converted into the AI’s voice, adopting all the pauses and inflections of the original recording. This means you’ll still need a person behind the microphone to speak the words, but this can be done by anyone, really.
What does this mean for voice actors globally?
So, what’s the impact of this is going to be? A quick Google search reveals that while it’s hard to say exactly how many people rely on voice acting work as part of their income, market research suggests that the global voice-over market was valued at around USD 1.5 billion in 2021 and is expected to reach USD 2.3 billion by 2026.
This surge in demand is mainly due to the proliferation of online content, animation, video games, e-learning, and audiobooks.
To no one’s surprise, the voice acting industry also presents major opportunity for AI companies like ElevenLabs, Resemble AI and OpenAI, who have all been busy developing increasingly sophisticated AI voices. They are readily available, can be generated on-the-fly, customized to preference, and lo and behold, they speak 20+ languages.
That’s like having a cheap, on-demand polyglot that never complains and never gets sick. There’s literally no world in which humans currently employed to do this work are not being outcompeted by AI.
While famous people known for their voices, like world-renowned singers and actors, may still have a future selling the rights to their voice, for everyone else, it’s game over. Voice acting as a career path is a dead-end street. My god-honest prediction: the profession will fizzle out within the next 3 to 5 years. Chances are it will go even quicker.
Join the conversation 🗣
Leave a comment or like this article if it resonated with you.
Get in touch 📥
Shoot me an email at jurgen@cdisglobal.com.
I have to say, I am a bit surprised by this strong statement. There are a few reasons why I think voice actors should not be worried that much. On one hand, yes, we already see companies replacing voice actors with AI. On the other, though:
1. This AI hype seems to be overrated. Companies are sharing carefully curated samples, that do not always reflect real usage of said voices.
2. I work with such voices, provided by one of the companies mentioned. They sound amazing, yes. But they hallucinate. Yes, just like GenAI because these voices are also less rule-based, and more LLM/AI-based. I had a few instances of a text being read out correctly four times, and sounding like a possessed monkey the fifth time.
3. There is not much room for manipulation. At least the solution I work with is not subject to SSML, meaning I cannot even change the speed or pitch. Introducing this feature isn't a big deal, but there is so much more. I need to adjust not only the voice altogether, but way more often - specific words. Some need to sound more serious, some cheerful. I think Azure provides some of these features for EN, but I haven't seen this working for any other language, and it seems it's even less a thing with these AI-based voices.
Altogether, I wouldn't worry if I were a voice actor. I mean, one can spend ages trying to tweak a text and all possible params to have the AI read it out nicely, but ultimately the easy solution is to just hire a voice actor, especially if it's not a neutral text but one that needs to convey emotions, and even more so if they keep changing throughout a text.
As someone who follows the audiobook and podcasting industries closely, I've noticed a bit of a misunderstanding among many AI enthusiasts about what voice acting and audiobook narration actually are. Actors and narrators' primary job is to vocally interpret texts, not to convey information.
Actors make choices that are grounded in their understanding of the story and the author's/characters' motivations. Then they use their creativity and nuanced control of their vocal instrument to record a unique performance. AI's current ability to do this is--depending on your perspective--either zero or very primitive.
If verisimilitude were the only barrier to entry to VO, literally any human could be a successful voice actor. If conveying the information in the text fluidly were the sole purpose of or audiobook narration, Audible would not have separate ratings for the story and the performance.
Yes, for many utilitarian functions, AI will be able to do the job, but that's long been the case with text-to-speech. When it comes to replacing talented professionals in high-quality podcasts, audiobooks, TV shows and commercials, and video games, we are nowhere near that.
It's telling that those who proclaim the death of VO tend to be very knowledgeable about AI, but know very little about the industries they claim it will kill off.