By
Debra KaufmanMay 22, 2020
Facebook introduced an AI text-to-speech system (TTS) that produces a second of audio in 500 milliseconds. According to Facebook, the system, which is used with a new approach to data collection, powered the creation of a British accent-inflected voice in six months, versus over a year required for other voices. The TTS is now used for Facebook’s Portal smart display brand. The system can be hosted in real time via ordinary processors and is also available as a service for other apps, including Facebook’s VR. Continue reading Facebook Reveals New AI-Powered Text-to-Speech System
As another example of the significant advances we have been following in artificial intelligence and deep learning, Chinese search giant Baidu has introduced Deep Voice 2, the second iteration of its compelling text-to-speech system. The company introduced Deep Voice just three months ago, with the ability to produce speech “in near real time” that was “nearly indistinguishable from an actual human voice,” according to The Verge. While the first system was limited to learning one voice at a time, “and required many hours of audio or more from which to build a sample,” the updated version “can learn the nuances of a person’s voice with just half an hour of audio, and a single system can learn to imitate hundreds of different speakers.” Continue reading Text-to-Speech System Quickly Mimics Hundreds of Accents