Speech Archives - ETCentric

Meta Adds Indigenous Languages to Speech and Translation AI

By Paula Parisi
February 11, 2025

Meta is seeking to make AI more inclusive with a program to support underserved languages “and help bring their speakers into the digital conversation.” Meta’s Fundamental AI Research (FAIR) unit has teamed with UNESCO to launch the Language Technology Partner Program, which is looking for people who can provide more than 10 hours of speech recordings (with transcriptions) and chunks of written text (200+ sentences, with translation) in diverse languages. “Partners will work with our teams to help integrate these languages into AI-driven speech recognition and machine translation models, which when released will be open sourced,” Meta said. Continue reading Meta Adds Indigenous Languages to Speech and Translation AI

Hume AI Introduces Voice Control and Claude Interoperability

By Paula Parisi
December 4, 2024

Artificial voice startup Hume AI has had a busy Q4, introducing Voice Control, a no-code artificial speech interface that gives users control over 10 voice dimensions ranging from “assertiveness” to “buoyancy” and “nasality.” The company also debuted an interface that “creates emotionally intelligent voice interactions” with Anthropic’s foundation model Claude that has prompted one observer to ponder the possibility that keyboards will become a thing of the past when it comes to controlling computers. Both advances expand on Hume’s work with its own foundation model, Empathic Voice Interface 2 (EVI 2), which adds emotional timbre to AI voices. Continue reading Hume AI Introduces Voice Control and Claude Interoperability

Nvidia AI Model Fugatto a Breakthrough in Generative Sound

By Paula Parisi
November 27, 2024

Nvidia has unveiled an AI sound model research project called Fugatto that “can create any combination of music, voices and sounds” based on text and audio inputs. Described by Nvidia as “the world’s most flexible sound machine,” many appear to agree that the new model represents an audio breakthrough, with the potential to generate a wide array of sounds that have not previously existed. While popular sound models from companies including Suno and ElevenLabs “can compose a song or modify a voice, none have the dexterity of the new offering,” Nvidia claims. Continue reading Nvidia AI Model Fugatto a Breakthrough in Generative Sound

Humanoid Robot Figure 02 Touts Better Strength, Reasoning

By Paula Parisi
August 9, 2024

Robotics startup Figure AI — with investors including OpenAI, Nvidia and Microsoft — has released its next-gen humanoid, Figure 02. Its predecessor made a splash earlier this year with a demo that captured it conversing with an interlocutor as it organized household items and prepared a snack. Compared to the Figure 01 prototype, with exposed wiring and limited range of motion, Figure 02 is more polished. The latest iteration boasts skeletal improvements for heavier lifting as well as enhanced visual reasoning to assist with machine learning. The result is characterized as “a major leap” in AI-powered robotics, a category in which players include Tesla and 1X Technologies. Continue reading Humanoid Robot Figure 02 Touts Better Strength, Reasoning

ElevenLabs Voice Isolator Audio Post Tool Released with API

By Paula Parisi
July 15, 2024

New York-based speech synthesis software startup ElevenLabs has launched its latest AI development — Voice Isolator and an API to go with it. Voice Isolator is designed to extract background noise, leaving clear dialogue for film, podcast, and interview post-production. The Voice Isolator API lets developers integrate the new product into third-party applications. To use the technology, content is uploaded and processed by the Voice Isolator model, resulting in what the company claims is speech comparable in quality to that obtained in a recording studio. The app is described as “free, with some limitations.” Continue reading ElevenLabs Voice Isolator Audio Post Tool Released with API

OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

By Rob Scott
May 13, 2024

OpenAI CTO Mira Murati announced during a live-streamed event today that the company is launching an updated version of its GPT-4 model that powers OpenAI’s popular chatbot. The new flagship AI model, GPT-4o is reportedly “much faster” and offers improved text, voice and vision capabilities. Murati said GPT-4o will be free to all users, while Plus users will enjoy “up to five times the capacity limits” available to free users. According to OpenAI, the new AI model “can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.” Continue reading OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

By ETCentric Staff
April 22, 2024

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

By ETCentric Staff
February 21, 2024

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Meta AI Seamless Translator Converts Nearly 100 Languages

By Paula Parisi
December 5, 2023

The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says. Continue reading Meta AI Seamless Translator Converts Nearly 100 Languages

Adobe Reveals Its New AI Tool for Editing Problematic Audio

By Paula Parisi
November 22, 2023

Adobe has unveiled Project Sound Lift, an AI-powered technology that separates speech recordings into discrete tracks of voices, non-speech sounds and other background noise in video. The company describes Project Sound Lift as “a one-click solution” that leverages AI to help users easily manipulate audio recordings “across a range of scenarios” to “enhance, transform, and control speech and sound independently.” Adobe’s existing Enhance Speech technology, available in the company’s Premiere Pro editing program, has been integrated within Project Sound Lift to aid creators in producing studio-quality audio content. Continue reading Adobe Reveals Its New AI Tool for Editing Problematic Audio

OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

By Paula Parisi
September 27, 2023

OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

Meta Creates Voicebox Generative AI Model for Audio Synth

By Paula Parisi
June 21, 2023

Meta Platforms has unveiled Voicebox, an AI model that can produce high-quality audio clips and edit pre-recorded audio. It also uses artificial intelligence for speech generation efforts, using what Meta calls “in-context learning” to accomplish tasks it was not specifically trained for. The company says Voicebox is first in class with this type of generalized learning for audio. Untrained tasks include sampling, stylizing and editing. As an editor, it can isolate and remove sounds like car horns and background animal noise while preserving the content and style of the source audio. The multilingual model generates speech in six languages. Continue reading Meta Creates Voicebox Generative AI Model for Audio Synth

CES: Startup Leverages AI to Address Problematic Acoustics

By Phil Lelyveld
January 9, 2023

There are a growing number of companies working on technologies that strive to make a person’s voice more intelligible to the listener over speakers, headphones, hearing aids and other consumer audio devices. Augmented Hearing, a Danish startup launched two years ago, is one of the more interesting companies at CES 2023 focusing on this space. The firm’s software-based solution runs on iOS, Windows and other CE operating systems. Their solution could mitigate the current trend of people across all age groups turning on closed captioning because they often find video dialogue difficult to understand. Continue reading CES: Startup Leverages AI to Address Problematic Acoustics

Microsoft Project Oxford Updates Could Bring AI to More Apps

By Rob Scott
November 12, 2015

Following announcements that Google is releasing its TensorFlow machine learning platform so developers can create their own artificial intelligence programs, and Nvidia has made a significant update to its Jetson TX1 supercomputer-on-a-chip, Microsoft is the latest with major AI news. The company has updated its Project Oxford suite of AI tools with powerful new features and programs designed to identify human emotions and voices, for example, that could make their way into the apps we use on a daily basis. Continue reading Microsoft Project Oxford Updates Could Bring AI to More Apps

Google Using RankBrain Artificial Intelligence Tech for Search

By Debra Kaufman
October 28, 2015

Google is now relying on artificial intelligence, with a system dubbed RankBrain, for a small but significant part of its search business. Since Google is identified with search, keeping on the bleeding edge of search technology is critical to its dominance, and Google has been researching artificial intelligence — software that learns about the world — for over five years. Prior to launching RankBrain for search, Google has been a big corporate sponsor of AI, invested in it for videos, speech and translation. Continue reading Google Using RankBrain Artificial Intelligence Tech for Search