As a professional voice actor, I’m not exactly cheering when computers get better and better at mimicking the human voice. But running away from it or dismissing it as inferior by definition is not really wise either. So, I’ve delved into it. Let’s take a look at the latest state of affairs.
This article was translated from the original Dutch by A.I.
Below you will hear some fragments that have been generated via Elevenlabs.io, which is currently leading in the field of artificial voices, especially cloning voices. Based on just a few minutes of material, a copy of my voice was created.
I tested this two years ago at Descript.com, but they were not nearly as good. There I had to read a story of over 30 minutes and my virtual voice was only ready after a night of computation. This was the result:
Well, we’re not losing sleep over that. But that was at the end of 2021 and developments are moving fast! Let’s examine that.
I had ChatGPT write a piece of text as you would hear in a documentary, about the manufacture of microprocessors. I’ll paste it below so you can read along:
Welcome to the fascinating world of microprocessor production. Today, we’ll take you on a journey through the intricate process of creating these powerful electronic brains. In the heart of a state-of-the-art fabrication facility lies the cleanroom, where cutting-edge technology converges with precision engineering. This is where the magic happens.
Here, silicon wafers are meticulously crafted, layer by layer, using advanced lithography techniques. These wafers will become the foundation of our microprocessors. The etching process sculpts intricate patterns onto the silicon wafer, defining the transistors and interconnections that make up a microprocessor.
Next, deposition adds various materials onto the wafer, creating vital components such as gates, conductors, and insulators.
Photolithography transfers complex designs onto the wafer, enabling the creation of microscopic structures with incredible accuracy.
After extensive testing, individual microprocessors are carefully packaged, ready to power a vast array of technological marvels.
Every step in the production of microprocessors is a testament to human ingenuity and precision engineering. These tiny wonders continue to revolutionize our world, driving innovation and powering the devices we rely on every day.
Then I had this text spoken by a cloned version of my voice. By the way, it doesn’t speak Dutch yet.
You can play with various settings, but you can’t direct this voice per sentence yet and you can hear that. In the first example, ‘my’ voice starts very expressive and then for some reason it becomes very monotonous:
That’s not acceptable, so I made a version that was less expressive. The lower the setting for stability, the more ‘random’ expressiveness is added to the voice.
Is that better? At least it’s more consistent. What would happen if I set the expressiveness very high and went for 100% realism right away? Attempt 3, with a shortened text:
Well, it really is my voice. By the way, I had to intervene here to get the ‘dot nl’ pronounced correctly at the end. But this is not how I would do it. I would adjust the speed and also intonate better. All this expressiveness appears at the weirdest moments. And that first sentence, ‘Today, we’ll take you on a journey through the intricate process of creating these powerful electronic brains‘ never really comes out sounding as if we’re going to look at something interesting.
However, it’s not all doom and gloom with those voices. With the same samples, I suddenly speak fluent Italian!
I don’t speak that language myself, but even I can hear that the intonation is not good enough here. Yet I could use this, for example for voicing a telephone exchange in multiple languages, including a few that I don’t speak. And there are plenty of YouTubers who have been content with Siri’s voice for years, who would also happily use it. But they were never my clients anyway.
By the way, I bet you’re curious what it sounds like when I voice this text myself? Well, this is my real, human voice:
And just because I could, I also made a recording of an American sounding artificial voice:
So, it’s getting better and better, but we’re not quite there yet. Still, this is only the first loop of the A.I. rollercoaster!
What are legitimate uses for such a virtual voice?
They do exist! Think of:
- Giving a voice to speaking computer systems, which will be commonplace in the near future. This could be a virtual customer service representative, but also a teacher.
- Giving a voice to people who have lost their voice due to illness, like Stephen Hawking. They can even get their own voice back.
- Making audiobooks from books for which this process is not commercially viable if done by human actors.
- Being able to read out, for example, newspaper articles, something not only blind people appreciate.
- Preserving a voice that would otherwise have disappeared with the owner, so that certain fictional characters will not lose their voice. I think of the magnificent voice of Jerome Reehuis, but also that of Frits Lambrechts and Sacco van der Made. For example, Frits Lambrechts is really the best Dutch ‘Mater’ from Cars and there was never a better voice for (Dutch) Uncle Scrooge than Sacco van der Made.
I must admit that as a voice actor I prefer not to do long audiobooks. The price per hour is not attractive and it is a huge strain on your voice. But if an A.I. does the heavy lifting with my voice and I only have to intervene when the A.I. doesn’t get the intonation or pronunciation right, it could still be interesting.
I fear it will all become a bit more difficult for me. In the end, human labour is always more expensive and everyone makes their own price/quality trade-off. But voice actors are certainly not the only ones who have something to fear from artificial intelligence. I predict that call center employees, translators, and copywriters will also face stiff competition, closely followed by graphic designers and video makers. Only people who really work with their hands, like my brother-in-law the plumber, have nothing to fear for the time being. So be it. Times change and we change with the times.