By: Nick Gambino
Microsoft’s new AI, VALL-E, can clone or mimic any human voice after listening to a three-second audio sample. That’s all it takes and it can create an entire dialogue that sounds like it’s coming from the same person.
Researchers at Microsoft said they trained VALL-E by feeding it 60,000 hours of speeches spoken in English by over 7,000 speakers. As long as the voice it’s attempting to mimic is close in range to one of those 7,000-plus speakers, this AI will be able to clone it with accuracy. That new cloned voice can be made to say just about anything. It can even replicate the room tone of the audio recording.
The results are pretty impressive. While VALL-E isn’t batting a perfect record, with some sounding off or robotic, especially when the dialogue is longer, there are some samples that mirror the original human speaker almost exactly. You can’t tell the difference. And that’s after processing a single three-second audio snippet of the voice.
Now, whether it can hold a sustained and believable human conversation is another thing entirely. While AI is advancing at a mad rate, it’s still not quite convincing enough to be human.However, give it a few years and we might just get there. Additional Resource: ai human
Still, in its current form VALL-E is yet another step in a scary direction. Deepfakes are already a concern. While they can be entertaining in TikTok and YouTube videos, when they enter the realm of politics or scams they can be used for nefarious purposes. Add in an AI tool that allows you to quickly replicate someone’s voice and you’ve got a recipe for disaster.
Of course Microsoft didn’t develop VALL-E with this intent. They created this “neural codec language model” to be used in text-to-speech tools and other such applications. It just so happens we live in a world where original intent is not enough to prevent bad actors from perverting it.
In the famous words of Chaos Theory expert Ian Malcolm, “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”
Do you think this is taking AI too far? Let us know in the comments below!