ESPN Readies a Data-Filled Sports Talk Host Generated by AI

A digital avatar may soon join the talent lineup on ESPN’s college football show “SEC Nation.” Called FACTS, the AI-generated character was developed at the ESPN Edge Innovation Center as “a way to help foster engagement and educate fans on complex sports analytics,” according to ESPN. The avatar was unveiled last week at the 4th Annual ESPN Edge Conference. Built on Nvidia’s Omniverse platform, using the company’s ACE microservices, FACTS integrates with Azure OpenAI for natural language processing and ElevenLabs for text-to-speech integration. Continue reading ESPN Readies a Data-Filled Sports Talk Host Generated by AI

DeepL Voice Translates 33 Languages to Captions in Real Time

DeepL, a German company that gained a profile with online text translation, has released DeepL Voice, a B2B tool that translates to captions in real time. DeepL Voice debuts in two iterations: DeepL Voice for Meetings, which allows participants to speak in their preferred language while serving colleagues with captions, and DeepL Voice for Conversations, which works on mobile devices, facilitating in-person, one-on-one conversations “with customers, colleagues or anyone else, in the language that works best for them,” the company explains, noting that real-time voice translation offers specific challenges. Continue reading DeepL Voice Translates 33 Languages to Captions in Real Time

ElevenLabs Reader App Is Available Globally in 32 Languages

New York-based ElevenLabs is going global with its generative AI text-to-speech reader app, which can narrate writings in 32 languages with thousands of voices from which to choose. The audio startup promises “high quality, human-like” AI voices that are “emotionally and contextually aware,” adapting delivery of written cues “to achieve a high emotional range.” ElevenLabs has focused on “creative workflow,” with a voice isolator and audio effects generator tools. Its catalog includes the voices of celebrities Judy Garland, Laurence Olivier, James Dean and Burt Reynolds. Custom models for translation and voiceover work using contemporary actors is a future possibility. Continue reading ElevenLabs Reader App Is Available Globally in 32 Languages

D-ID Employs AI to Translate Videos into Multiple Languages

D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation. Continue reading D-ID Employs AI to Translate Videos into Multiple Languages

ElevenLabs Voice Isolator Audio Post Tool Released with API

New York-based speech synthesis software startup ElevenLabs has launched its latest AI development — Voice Isolator and an API to go with it. Voice Isolator is designed to extract background noise, leaving clear dialogue for film, podcast, and interview post-production. The Voice Isolator API lets developers integrate the new product into third-party applications. To use the technology, content is uploaded and processed by the Voice Isolator model, resulting in what the company claims is speech comparable in quality to that obtained in a recording studio. The app is described as “free, with some limitations.” Continue reading ElevenLabs Voice Isolator Audio Post Tool Released with API

DeepMind’s V2A Generates Music, Sound Effects, Dialogue

Google DeepMind has unveiled new research on AI tech it calls V2A (“video-to-audio”) that can generate soundtracks for videos. The initiative complements the wave of AI video generators from companies ranging from biggies like OpenAI and Alibaba to startups such as Luma and Runway, all of which require a separate app to add sound. V2A technology “makes synchronized audiovisual generation possible” by combining video pixels with natural language text prompts “to generate rich soundscapes for the on-screen action,” DeepMind writes, explaining that it can “create shots with a dramatic score, realistic sound effects or dialogue.” Continue reading DeepMind’s V2A Generates Music, Sound Effects, Dialogue

ElevenLabs Launches an AI Tool for Generating Sound Effects

ElevenLabs has launched its text-to-sound generator Sound Effects for all users, available now at the company’s website. The new AI tool can create audio effects, short instrumental tracks, soundscapes and even character voices. Sound Effects “has been designed to help creators — including film and television studios, video game developers, and social media content creators — generate rich and immersive soundscapes quickly, affordably and at scale,” according to the startup, which developed the tool in partnership with Shutterstock, using its library of licensed audio tracks. Continue reading ElevenLabs Launches an AI Tool for Generating Sound Effects

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

On the heels of ElevenLabs’ demo of a text-to-sound app unveiled using clips generated by OpenAI’s text-to-video artificial intelligence platform Sora, Pika Labs is releasing a feature called Lip Sync that lets its paid subscribers use the ElevenLabs app to add AI-generated voices and dialogue to Pika-generated videos and have the characters’ lips moving in sync with the speech. Pika Lip Sync supports both uploaded audio files and text-to-audio AI, allowing users to type or record dialogue, or use pre-existing sound files, then apply AI to change the voicing style. Continue reading Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

Latest Disney Accelerator Backs AI, VR, Autonomous Vehicles

The Walt Disney Company has selected five companies to be in its annual Accelerator program, three of them AI startups, one in robotics and one developing VR. The program, now in its tenth year, identifies promising new tech companies to benefit from Disney funding and mentorship in exchange for an inside track on talent and acquisitions. The class of 2024 includes AudioShake, which leverages AI to aid in mixing and dubbing audio tracks for mixing or dubbing; ElevenLabs, which has a text-to-speech app for GenAI voicing; and Promethean AI, a digital archives search platform that informs prototype design. Continue reading Latest Disney Accelerator Backs AI, VR, Autonomous Vehicles

ElevenLabs Promotes Its Latest Advances in AI Audio Effects

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators. Continue reading ElevenLabs Promotes Its Latest Advances in AI Audio Effects

Captions Debuts AI Lipdub with Translation and Gen Z Slang

Captions, which leverages AI to help its customers produce “studio quality videos directly from their mobile devices,” has launched a new app called Lipdub that automatically translates and dubs content into 28 languages. The free download lets user dub anyone “and experience familiar voices and faces in a suite of new languages.” Lipdub’s translations not only duplicate what the company says is “the subject’s exact voice,” but also syncs lip movements to match. It also incorporates dialects and idioms, with options like Gen Z and Texas slang. Continue reading Captions Debuts AI Lipdub with Translation and Gen Z Slang

Auto-GPT Generates Social Sizzle, Ushers in Era of AI Agents

Auto-GPT, an open source app that uses OpenAI’s text-generating models, is currently generating a great deal of social media attention. The program can act somewhat autonomously in that it creates its own feedback loop, asking itself a series of questions to help build a more nuanced and complete response to a text prompt. In short, something that would take a user multiple prompts to produce the desired information using ChatGPT could be accomplished using a single request of Auto-GPT, which could independently explore a subject before spitting back a comprehensive response. Continue reading Auto-GPT Generates Social Sizzle, Ushers in Era of AI Agents