DeepL Voice Translates 33 Languages to Captions in Real Time

DeepL, a German company that gained a profile with online text translation, has released DeepL Voice, a B2B tool that translates to captions in real time. DeepL Voice debuts in two iterations: DeepL Voice for Meetings, which allows participants to speak in their preferred language while serving colleagues with captions, and DeepL Voice for Conversations, which works on mobile devices, facilitating in-person, one-on-one conversations “with customers, colleagues or anyone else, in the language that works best for them,” the company explains, noting that real-time voice translation offers specific challenges. Continue reading DeepL Voice Translates 33 Languages to Captions in Real Time

Runway’s Act-One Facial Capture Could Be a ‘Game Changer’

Runway is launching Act-One motion capture system that uses video and voice recordings to map human facial expressions onto characters using the company’s latest model, Gen-3 Alpha. Runway calls it “a significant step forward in using generative models for expressive live action and animated content.” Compared to past facial capture techniques — which typically require complex rigging — Act-One is driven directly and only by the performance of an actor, requiring “no extra equipment,” making it more likely to capture and preserve an authentic, nuanced performance, according to the company. Continue reading Runway’s Act-One Facial Capture Could Be a ‘Game Changer’

Microsoft’s Copilot AI Assistant Update Adds Voice and Vision

Microsoft announced that its Copilot AI assistant has received a major overhaul, gaining voice and vision capabilities. Copilot also now has a virtual news reader mode to present headlines, as well as the ability to see what you see and to interact in a more conversational manner. Before a general release, these tools will be trialed among a subset of Copilot Pro users “to gather feedback” and make them “better and safer.” Microsoft AI Executive VP and CEO Mustafa Suleyman says the changes herald “a calmer, more helpful and supportive era of technology, quite unlike anything we’ve seen before.” Continue reading Microsoft’s Copilot AI Assistant Update Adds Voice and Vision

Amazon Is Inviting Audible Narrators to Create AI Voice Clones

Amazon is aiming to speed up production of its Audible audiobooks by inviting a small group of narrators to clone their voices using generative artificial intelligence. The U.S. beta test will roll out later this year according to Amazon, which announced the move on Audible’s creator marketplace. “There is a vast catalog of books that does not yet exist in audio and as we explore ways to bring more books to life on Audible, we’re committed to thoughtfully balancing the interests of authors, narrators, publishers, and listeners,” Amazon explains. Continue reading Amazon Is Inviting Audible Narrators to Create AI Voice Clones

ElevenLabs Reader App Is Available Globally in 32 Languages

New York-based ElevenLabs is going global with its generative AI text-to-speech reader app, which can narrate writings in 32 languages with thousands of voices from which to choose. The audio startup promises “high quality, human-like” AI voices that are “emotionally and contextually aware,” adapting delivery of written cues “to achieve a high emotional range.” ElevenLabs has focused on “creative workflow,” with a voice isolator and audio effects generator tools. Its catalog includes the voices of celebrities Judy Garland, Laurence Olivier, James Dean and Burt Reynolds. Custom models for translation and voiceover work using contemporary actors is a future possibility. Continue reading ElevenLabs Reader App Is Available Globally in 32 Languages

SAG-AFTRA Strikes a Deal with Narrativ for AI Voice Replicas

SAG-AFTRA announced it is teaming with online talent marketplace Narrativ to provide the guild’s 160,000 members with the option of working with the New York-based AI startup to license their voice replicas for use in digital audio advertising. The deal would make it easy for voice actors to be considered for replicant work and get compensated, according to SAG-AFTRA, which emphasizes that performers will control the particulars, including whether to make their voices available, brand approval and fees. Narrativ also represents visual likenesses, but the SAG-AFTRA announcement is limited to voice work. Continue reading SAG-AFTRA Strikes a Deal with Narrativ for AI Voice Replicas

D-ID Employs AI to Translate Videos into Multiple Languages

D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation. Continue reading D-ID Employs AI to Translate Videos into Multiple Languages

Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice

Google has released its AI assistant, Gemini Live, and is positioning it to replace Google Assistant on mobile. Gemini Live is rolling out on Android to subscribers of Gemini Advanced, which is part of the $20 monthly Google One AI Premium plan. Those consumers who purchase the new Pixel 9 Pro — which begins shipping this week — will get the assistant as part of a year of free access to Gemini Advanced, a $240 value, according to the company. Google claims that Gemini Live technology enables natural, flowing conversations with the AI assistant, putting “a sidekick in your pocket.” Continue reading Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice

OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus

OpenAI has released its new Advanced Voice Mode in a limited alpha rollout for select ChatGPT Plus users. The feature, which is being implemented for the ChatGPT mobile app on Android and iOS, aims for more natural dialogue with the AI chatbot. Powered by GPT-4o, which is multimodal, Advanced Voice Mode is said to be able to sense emotional inflections, including excitement, sadness or singing. According to an OpenAI post on X, the company plans to “continue to add more people on a rolling basis” so that everyone using ChatGPT Plus will have access to the new feature in the fall. Continue reading OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus

Lifelike AI Avatars to Get New Features with Synthesia Update

Synthesia, which uses AI to create business avatars for use in content such as training, presentation and customer service videos, has announced a major platform update. “Coming soon” with Synthesia 2.0 are full-body avatars that include hands capable of a wide range of motions. Users can animate motion using skeletal sequences on which the persona selected from the catalog can then be automatically mapped. Starting next month, the Nvidia-backed UK company will offer the ability to incorporate brand identity — including typography, colors and logos — into templated videos. A new translation tool automatically applies updates to all languages. Continue reading Lifelike AI Avatars to Get New Features with Synthesia Update

Nokia Makes the First-Ever 3D Spatial Audio Cell Phone Call

Nokia made what it claims is “the world’s first immersive voice and audio call” using cell phones, made possible by the new 3GPP Immersive Voice and Audio Services (IVAS) codec that lets consumers hear 3D spatial sound in real-time. The codec — which Nokia participated in crafting — is a major leap from today’s standard monophonic smartphone voice call experience and is part of the upcoming 5G Advanced standard. The innovation paves the way towards enhanced immersive spatial communications, extended reality and metaverse applications, says Nokia, explaining that it works across “any connected device,” including smartphones, tablets and PCs. Continue reading Nokia Makes the First-Ever 3D Spatial Audio Cell Phone Call

OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

OpenAI CTO Mira Murati announced during a live-streamed event today that the company is launching an updated version of its GPT-4 model that powers OpenAI’s popular chatbot. The new flagship AI model, GPT-4o is reportedly “much faster” and offers improved text, voice and vision capabilities. Murati said GPT-4o will be free to all users, while Plus users will enjoy “up to five times the capacity limits” available to free users. According to OpenAI, the new AI model “can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.” Continue reading OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Audio-First Social Platform Airchat Has Successful Relaunch

Airchat is the latest app to take tech leaders in Silicon Valley by storm. Described as a “combination of voice notes and Twitter,” Airchat lets you follow other users and scroll through posts — adding replies, likes and shares — but the twist is the content is generated through audio recordings the app then transcribes. Airchat ranked 27th on the App Store’s social networking chart, even though users must be invited to join. Launched last year by Naval Ravikant, founder of AngelList, and erstwhile Tinder product exec Brian Norgard, Airchat was just relaunched on iOS and Android. Continue reading Audio-First Social Platform Airchat Has Successful Relaunch

Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura