Meta’s ‘Voicebox’ AI is a text-to-speech tool that learns similarly to ChatGPT.

Meta AI has recently introduced a text-to-speech (TTS) generator named Voicebox, which claims to produce results up to 20 times faster than state-of-the-art artificial intelligence models with comparable performance. The new system uses a model more similar to OpenAI’s ChatGPT or Google’s Bard than traditional TTS architecture.

One of the main differences between Voicebox and similar TTS models is that Meta’s offering can generalize through in-context learning. This capability is achieved by using large-scale training datasets, similar to ChatGPT or other transformer models.

Voicebox uses a novel training scheme that ditches labels and curation for an architecture capable of “in-filling” audio information. As a result, it can translate text to speech, remove unwanted noise by synthesizing replacement speech, and even apply a speaker’s voice to different language outputs.

The arrival of robust speech-generation comes at a particularly sensitive time as social media companies continue to struggle with moderation and, in the U.S., a looming presidential election threatens to once again test the limits of online misinformation detection. To mitigate possible future risks brought by this technology, Meta has developed a tool for determining if speech was generated by it, which the company claims can “trivially detect” the difference between real and fake audio.

In the cryptocurrency world, AI has become as integral to day-to-day operations for most businesses as the internet or electricity. The advent of robust text-to-speech systems such as Voicebox, combined with automated trading, could help bridge a gap for would-be cryptocurrency traders who rely on TTS systems that, currently, may struggle with crypto jargon or multi-lingual support.