Bertelsmann and ElevenLabs Team Up to Foster AI Production

German media company Bertelsmann has partnered with AI startup ElevenLabs on an effort to drive tech innovation and workflow across Bertelsmann production, marketing and distribution. Bertelsmann operations span roughly 50 countries with businesses including the publisher Penguin Random House, record label BMG and the RTL Group television unit. The objective is for ElevenLabs tools in voice and audio generation to help Bertelsmann expand productivity and reach. In August, New York-based ElevenLabs opened a European headquarters in London, expanding its international footprint for text-to-speech and other audio apps. Continue reading Bertelsmann and ElevenLabs Team Up to Foster AI Production

ESPN Readies a Data-Filled Sports Talk Host Generated by AI

A digital avatar may soon join the talent lineup on ESPN’s college football show “SEC Nation.” Called FACTS, the AI-generated character was developed at the ESPN Edge Innovation Center as “a way to help foster engagement and educate fans on complex sports analytics,” according to ESPN. The avatar was unveiled last week at the 4th Annual ESPN Edge Conference. Built on Nvidia’s Omniverse platform, using the company’s ACE microservices, FACTS integrates with Azure OpenAI for natural language processing and ElevenLabs for text-to-speech integration. Continue reading ESPN Readies a Data-Filled Sports Talk Host Generated by AI

ElevenLabs Reader App Is Available Globally in 32 Languages

New York-based ElevenLabs is going global with its generative AI text-to-speech reader app, which can narrate writings in 32 languages with thousands of voices from which to choose. The audio startup promises “high quality, human-like” AI voices that are “emotionally and contextually aware,” adapting delivery of written cues “to achieve a high emotional range.” ElevenLabs has focused on “creative workflow,” with a voice isolator and audio effects generator tools. Its catalog includes the voices of celebrities Judy Garland, Laurence Olivier, James Dean and Burt Reynolds. Custom models for translation and voiceover work using contemporary actors is a future possibility. Continue reading ElevenLabs Reader App Is Available Globally in 32 Languages

Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

On the heels of ElevenLabs’ demo of a text-to-sound app unveiled using clips generated by OpenAI’s text-to-video artificial intelligence platform Sora, Pika Labs is releasing a feature called Lip Sync that lets its paid subscribers use the ElevenLabs app to add AI-generated voices and dialogue to Pika-generated videos and have the characters’ lips moving in sync with the speech. Pika Lip Sync supports both uploaded audio files and text-to-audio AI, allowing users to type or record dialogue, or use pre-existing sound files, then apply AI to change the voicing style. Continue reading Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

ElevenLabs Promotes Its Latest Advances in AI Audio Effects

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators. Continue reading ElevenLabs Promotes Its Latest Advances in AI Audio Effects

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Meta AI Seamless Translator Converts Nearly 100 Languages

The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says. Continue reading Meta AI Seamless Translator Converts Nearly 100 Languages

Meta’s Multimodal AI Model Translates Nearly 100 Languages

Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages

QuickVid Uses AI to Create Short Videos from Text Prompts

QuickVid is a new AI-driven text-to-video platform aiming for a mass market user base. The tool draws on various generative AI systems to automatically create short-form videos for YouTube, Instagram, TikTok and other platforms. Created by former Meta Platforms programmer Daniel Habib “in a matter of weeks,” QuickVid is quite rudimentary, though Habib says he plans to continue fine tuning and adding features. Unlike Google and Meta have done with their nascent text-to-video systems, QuickVid has bypassed the formalities of research papers and industry previews and jumped directly to a public-facing website. Continue reading QuickVid Uses AI to Create Short Videos from Text Prompts

Google Brings Personalization Features to Your News Update

Google is adding new features to Your News Update, its news aggregation service, to personalize 90-minute news feeds from each user’s preferred sources. The goal is to create a seamless listening experience akin to a customized song playlist. Each news playlist, similar to those on public radio, will begin with short clips about the major headlines moving into longer stories. The end product, available only in the U.S., will compile radio, podcast clips and text-to-speech translations tailored to the individual user. Continue reading Google Brings Personalization Features to Your News Update

Facebook Reveals New AI-Powered Text-to-Speech System

Facebook introduced an AI text-to-speech system (TTS) that produces a second of audio in 500 milliseconds. According to Facebook, the system, which is used with a new approach to data collection, powered the creation of a British accent-inflected voice in six months, versus over a year required for other voices. The TTS is now used for Facebook’s Portal smart display brand. The system can be hosted in real time via ordinary processors and is also available as a service for other apps, including Facebook’s VR. Continue reading Facebook Reveals New AI-Powered Text-to-Speech System

Amazon Licenses Original Interactive Audio Series for Alexa

Amazon has inked an exclusive license for “Tala’s World,” a seven-episode young adult adventure series produced by audio startup Xandra, which has produced Alexa skills for HBO, Sesame Workshop and Ubisoft. In the new adventure series, listeners help elf-like character Blobby find his missing best friend Tala by making decisions, collecting clues, and interrogating suspects. Available exclusively on Alexa, Amazon recently released the first episode and plans to release the second episode on December 13. Continue reading Amazon Licenses Original Interactive Audio Series for Alexa

Google and IBM Create Advanced Text-to-Speech Systems

Both IBM and Google recently advanced development of Text-to-Speech (TTS) systems to create high-quality digital speech. OpenAI found that, since 2012, the compute power needed to train TTS models has exploded to more than 300,000 times. IBM created a much less compute-intensive model for speech synthesis, stating that it is able to do so in real-time and adapt to new speaking styles with little data. Google and Imperial College London created a generative adversarial network (GAN) to create high-quality synthetic speech. Continue reading Google and IBM Create Advanced Text-to-Speech Systems

Publishers and Authors Guild Oppose Audible Text Feature

Audible, the audiobook app owned by Amazon, is using machine learning to transcribe audio recordings, so listeners can also read along with the narrator. Audible is promoting it as an educational feature, but some publishers are up in arms, demanding their books be excluded because captions are “unauthorized and brazen infringements of the rights of authors and publishers.” Publishers are concerned that this will lead to fewer people buying physical or e-books if they can get the text with an Audible audiobook. Continue reading Publishers and Authors Guild Oppose Audible Text Feature