Language Archives - ETCentric

Cohere’s Multimodal Embed Model Organizes Enterprise Data

By Paula Parisi
April 17, 2025

As enterprises rely more heavily on AI integration to compile research and summarize things like meetings and email threads, the need for contextual search has become increasingly important. AI startup Cohere has released Embed 4 to make the task easier. Embed 4 is a multimodal embedding model that transforms text, images and mixed data (like PDFs, slides or tables) into numerical representations (or “embeddings”) for tasks including semantic search, retrieval-augmented generation (RAG) and classification. Supporting over 100 languages, Embed 4 has an extremely large context window of up to 128,000 tokens. Continue reading Cohere’s Multimodal Embed Model Organizes Enterprise Data

Netflix Expands Dubbing and Subtitle Options to 30 Languages

By Paula Parisi
April 4, 2025

Netflix has gone multilingual, adding a feature that lets viewers choose from a list of more than 30 languages for dubbing or subtitles on any title. The option has previously only been available via mobile and Web browsers, with TV options limited to a handful of choices deemed relevant based on geographic location. Referencing some of its most popular programming — such as South Korea’s “Squid Game,” Spain’s “Berlin” and France’s “Lupin” — Netflix explains, “we know that language availability is what helped these stories and characters find fans beyond their country of origin.” Continue reading Netflix Expands Dubbing and Subtitle Options to 30 Languages

OpenAI Pushes Conversational Agents with Three New Models

By Paula Parisi
March 24, 2025

OpenAI has debuted three new models for transcription and voice generation — gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts. The text-to-speech and speech-to-text AI models are designed to help developers create AI agents with highly customizable voices. OpenAI claims these models will power natural and responsive voice agents, moving AI out of the text-based communications stage and into intuitive spoken conversations. The suite outperforms existing solutions in accuracy and reliability, OpenAI says, especially with “accents, noisy environments, and varying speech speeds,” making them well-suited for customer call centers and meeting notes. Continue reading OpenAI Pushes Conversational Agents with Three New Models

Amazon Prime Video Tests AI Dubbing for Movies and Series

By Paula Parisi
March 12, 2025

Amazon is experimenting with AI dubbing so Prime Video customers globally can experience content from other territories, gaining access more quickly and efficiently to licensed films and TV series. The company is using a hybrid “AI-aided” system in which localization professionals oversee the AI output to ensure quality control. Currently limited to a dozen movies and series that will be AI-dubbed in English and Latin American Spanish, the pilot will expand if the results prove popular with audiences. In December, Netflix experienced backlash against AI-assisted dubbing, with viewers complaining generative mouth adjustments looked unnatural. Continue reading Amazon Prime Video Tests AI Dubbing for Movies and Series

AI Startup Sesame Develops Next Stage of Voice Generation

By Paula Parisi
March 7, 2025

Sesame, an AI startup from Oculus co-founder Brendan Iribe, has created a conversational voice model that many feel has achieved uncanny levels of authenticity. Drawing comparisons to the charismatic vocal centerpiece of the 2013 Warner Bros. film “Her,” Sesame seems to have achieved a new level of engagement among AI voice assistants. While some are describing the tech as “amazing.” others have expressed concern over its capabilities. “Our goal is to achieve ‘voice presence’ — the magical quality that makes spoken interactions feel real, understood and valued,” explains a blog post by Iribe and others. Continue reading AI Startup Sesame Develops Next Stage of Voice Generation

YouTube Expands Access to Improved AI-Powered Dubbing

By Paula Parisi
December 12, 2024

Hundreds of thousands more YouTube channels are gaining access to its AI-powered auto-dubbing feature, which generates audio translation tracks for YouTube videos, helping to make the platform’s content more accessible to viewers around the world. The expanded rollout targets informational channels in the Partner Program, such as tutorials on cooking, sewing, tourism and home improvement. Availability “will expand to other types of content soon,” according to video streamer, which began testing the feature with select creators last year. Based on technology developed by Aloud, YouTube’s auto-dubbing emerged from the Area 120 internal incubator program. Continue reading YouTube Expands Access to Improved AI-Powered Dubbing

Hume AI Introduces Voice Control and Claude Interoperability

By Paula Parisi
December 4, 2024

Artificial voice startup Hume AI has had a busy Q4, introducing Voice Control, a no-code artificial speech interface that gives users control over 10 voice dimensions ranging from “assertiveness” to “buoyancy” and “nasality.” The company also debuted an interface that “creates emotionally intelligent voice interactions” with Anthropic’s foundation model Claude that has prompted one observer to ponder the possibility that keyboards will become a thing of the past when it comes to controlling computers. Both advances expand on Hume’s work with its own foundation model, Empathic Voice Interface 2 (EVI 2), which adds emotional timbre to AI voices. Continue reading Hume AI Introduces Voice Control and Claude Interoperability

Nvidia AI Model Fugatto a Breakthrough in Generative Sound

By Paula Parisi
November 27, 2024

Nvidia has unveiled an AI sound model research project called Fugatto that “can create any combination of music, voices and sounds” based on text and audio inputs. Described by Nvidia as “the world’s most flexible sound machine,” many appear to agree that the new model represents an audio breakthrough, with the potential to generate a wide array of sounds that have not previously existed. While popular sound models from companies including Suno and ElevenLabs “can compose a song or modify a voice, none have the dexterity of the new offering,” Nvidia claims. Continue reading Nvidia AI Model Fugatto a Breakthrough in Generative Sound

BodyTalk Dubs into 29 Languages with Facial Moves to Match

By Paula Parisi
November 12, 2024

Panjaya is a AI startup that aims to disrupt the world of video dubbing with a way to generate “hyperrealistic” recreations of a person’s voice speaking a new language. The system also automatically modifies the imagery to match lip and other physical movements to match the new speech patterns. Called BodyTalk, the technique is the launch point for Panjaya as it emerges from the stealth in which it conducted its R&D the past three years, backed by $9.5 million from venture funds and angel backers. The startup describes BodyTalk as “AI dubbing that looks and feels as natural as the original.” Continue reading BodyTalk Dubs into 29 Languages with Facial Moves to Match

Google Unveils Gemini-Powered Ad Features and AI Image ID

By Paula Parisi
September 19, 2024

AI-powered ad campaigns “are continuing to deliver big results for businesses large and small,” according to Google, which has put Gemini to work for Google Ads. The company announced at the DMEXCO digital marketing event in Cologne a new suite of Gemini-powered tools aimed at making the experience even better by providing additional insights and more control over where and how marketing assets are deployed globally using Google Ads. For starters, Gemini’s “conversational experience” for search campaigns will expand its language palette, making auto-generated headlines and images available in German, French and Spanish in the months ahead. Continue reading Google Unveils Gemini-Powered Ad Features and AI Image ID

OpenAI Voice Cloning Tool Needs Only a 15-Second Sample

By ETCentric Staff
April 2, 2024

OpenAI has debuted a new text-to-voice generation platform called Voice Engine, available in limited access. Voice Engine can generate a synthetic voice from a 15-second clip of someone’s voice. The synthetic voice can then read a provided text, even translating to other languages. For now, only a handful of companies are using the tech under a strict usage policy as OpenAI grapples with the potential for misuse. “These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI explained. Continue reading OpenAI Voice Cloning Tool Needs Only a 15-Second Sample

ElevenLabs Promotes Its Latest Advances in AI Audio Effects

By ETCentric Staff
February 22, 2024

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators. Continue reading ElevenLabs Promotes Its Latest Advances in AI Audio Effects

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

By ETCentric Staff
February 21, 2024

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Newsom Report Examines Use of AI by California Government

By Paula Parisi
November 29, 2023

California Governor Gavin Newsom has released a report examining the beneficial uses and potential harms of artificial intelligence in state government. Potential plusses include improving access to government services by identifying groups that are hindered due to language barriers or other reasons, while dangers highlight the need to prepare citizens with next generation skills so they don’t get left behind in the GenAI economy. “This is an important first step in our efforts to fully understand the scope of GenAI and the state’s role in deploying it,” Newsom said, calling California’s strategy “a nuanced, measured approach.” Continue reading Newsom Report Examines Use of AI by California Government

Captions Debuts AI Lipdub with Translation and Gen Z Slang

By Paula Parisi
October 17, 2023

Captions, which leverages AI to help its customers produce “studio quality videos directly from their mobile devices,” has launched a new app called Lipdub that automatically translates and dubs content into 28 languages. The free download lets user dub anyone “and experience familiar voices and faces in a suite of new languages.” Lipdub’s translations not only duplicate what the company says is “the subject’s exact voice,” but also syncs lip movements to match. It also incorporates dialects and idioms, with options like Gen Z and Texas slang. Continue reading Captions Debuts AI Lipdub with Translation and Gen Z Slang