Meta — otherwise known by its more sinister name that is Facebook not too long ago — has created a kickass AI that turns text to audio. But it proved to be a little too kickass, so much so that Meta says it won’t release it publicly because rotten criminal minds will definitely abuse it. In a bajillion ways.

The AI is called Voicebox. But here’s the scary part. All it needs is a two-second audio clip of any human person to learn and then start speaking in the same voice, copying everything from their pitch to signature voice tonality.

In a nutshell, you can make your aging innocent grandma rap Cardi B’s WAP or Nicki Minaj’s Anaconda, and traumatise yourself for an entire lifetime. And then the next. All you need is to train the AI by uploading an audio clip, then write whatever text comes to your mind, and Voicebox will read it for your twisted mental gratification.

A villainous AI peeking through a screen
Credit: Bing Image Creator / Microsoft Edge

But there’s more. Meta’s AI can translate whatever you write in six languages and simultaneously return an audio clip in those six languages. Heck, if you’ve recorded a shitty sad song for your ex, and your pet barked in the middle of it, the AI will take that audio clip, remove the barking noise, and return a pristine clip without any unwanted noise.

Sounds terrific, right? But just expand the scope of your angelic innocent mind and imagine all the ways in which criminals are going to exploit it. Scammers are already using AI tools to pose as your relative and phishing out sensitive deets like your banking details.

In March this year, an elderly Indian couple lost around ₹18 lakhs because a shithead mimicked their grandson’s voice using one such AI tool. In China, a man lost out to the tune of ₹5 crores to a tech-savvy conman who used a voice-deepfaking AI to impersonate the victim’s friend.

A robotic AI trying to intimidate a person
Credit: Bing Image Creator / Microsoft Edge

With all those risks in mind, Meta says it isn’t releasing Voicebox publicly, at least for now. Thankfully, in a separate research paper, the company has detailed a tool that can distinguish an original human’s voice and an audio clip generated by its overtly dangerous AI tool.

By the way, Microsoft researchers created a similar tool called Vall-E that only needed a sound clip worth three-second-long to start talking like it. Once again, the researchers didn’t release the tool out in the wild because of the abuse risks.

If you want to get a taste of the AI doomsday awaiting us, check out these brief explainers detailing Meta’s Voicebox AI thingmajig.

Did you like what you just read? Share it!