By
Paula ParisiDecember 4, 2024
Artificial voice startup Hume AI has had a busy Q4, introducing Voice Control, a no-code artificial speech interface that gives users control over 10 voice dimensions ranging from “assertiveness” to “buoyancy” and “nasality.” The company also debuted an interface that “creates emotionally intelligent voice interactions” with Anthropic’s foundation model Claude that has prompted one observer to ponder the possibility that keyboards will become a thing of the past when it comes to controlling computers. Both advances expand on Hume’s work with its own foundation model, Empathic Voice Interface 2 (EVI 2), which adds emotional timbre to AI voices. Continue reading Hume AI Introduces Voice Control and Claude Interoperability
By
Paula ParisiNovember 27, 2024
Nvidia has unveiled an AI sound model research project called Fugatto that “can create any combination of music, voices and sounds” based on text and audio inputs. Described by Nvidia as “the world’s most flexible sound machine,” many appear to agree that the new model represents an audio breakthrough, with the potential to generate a wide array of sounds that have not previously existed. While popular sound models from companies including Suno and ElevenLabs “can compose a song or modify a voice, none have the dexterity of the new offering,” Nvidia claims. Continue reading Nvidia AI Model Fugatto a Breakthrough in Generative Sound
By
Paula ParisiAugust 9, 2024
Robotics startup Figure AI — with investors including OpenAI, Nvidia and Microsoft — has released its next-gen humanoid, Figure 02. Its predecessor made a splash earlier this year with a demo that captured it conversing with an interlocutor as it organized household items and prepared a snack. Compared to the Figure 01 prototype, with exposed wiring and limited range of motion, Figure 02 is more polished. The latest iteration boasts skeletal improvements for heavier lifting as well as enhanced visual reasoning to assist with machine learning. The result is characterized as “a major leap” in AI-powered robotics, a category in which players include Tesla and 1X Technologies. Continue reading Humanoid Robot Figure 02 Touts Better Strength, Reasoning
By
Paula ParisiJuly 15, 2024
New York-based speech synthesis software startup ElevenLabs has launched its latest AI development — Voice Isolator and an API to go with it. Voice Isolator is designed to extract background noise, leaving clear dialogue for film, podcast, and interview post-production. The Voice Isolator API lets developers integrate the new product into third-party applications. To use the technology, content is uploaded and processed by the Voice Isolator model, resulting in what the company claims is speech comparable in quality to that obtained in a recording studio. The app is described as “free, with some limitations.” Continue reading ElevenLabs Voice Isolator Audio Post Tool Released with API
OpenAI CTO Mira Murati announced during a live-streamed event today that the company is launching an updated version of its GPT-4 model that powers OpenAI’s popular chatbot. The new flagship AI model, GPT-4o is reportedly “much faster” and offers improved text, voice and vision capabilities. Murati said GPT-4o will be free to all users, while Plus users will enjoy “up to five times the capacity limits” available to free users. According to OpenAI, the new AI model “can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.” Continue reading OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT
By
ETCentric StaffApril 22, 2024
Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time
By
ETCentric StaffFebruary 21, 2024
Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model
By
Paula ParisiDecember 5, 2023
The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says. Continue reading Meta AI Seamless Translator Converts Nearly 100 Languages
By
Paula ParisiNovember 22, 2023
Adobe has unveiled Project Sound Lift, an AI-powered technology that separates speech recordings into discrete tracks of voices, non-speech sounds and other background noise in video. The company describes Project Sound Lift as “a one-click solution” that leverages AI to help users easily manipulate audio recordings “across a range of scenarios” to “enhance, transform, and control speech and sound independently.” Adobe’s existing Enhance Speech technology, available in the company’s Premiere Pro editing program, has been integrated within Project Sound Lift to aid creators in producing studio-quality audio content. Continue reading Adobe Reveals Its New AI Tool for Editing Problematic Audio
By
Paula ParisiSeptember 27, 2023
OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search
By
Paula ParisiJune 21, 2023
Meta Platforms has unveiled Voicebox, an AI model that can produce high-quality audio clips and edit pre-recorded audio. It also uses artificial intelligence for speech generation efforts, using what Meta calls “in-context learning” to accomplish tasks it was not specifically trained for. The company says Voicebox is first in class with this type of generalized learning for audio. Untrained tasks include sampling, stylizing and editing. As an editor, it can isolate and remove sounds like car horns and background animal noise while preserving the content and style of the source audio. The multilingual model generates speech in six languages. Continue reading Meta Creates Voicebox Generative AI Model for Audio Synth
By
Phil LelyveldJanuary 9, 2023
There are a growing number of companies working on technologies that strive to make a person’s voice more intelligible to the listener over speakers, headphones, hearing aids and other consumer audio devices. Augmented Hearing, a Danish startup launched two years ago, is one of the more interesting companies at CES 2023 focusing on this space. The firm’s software-based solution runs on iOS, Windows and other CE operating systems. Their solution could mitigate the current trend of people across all age groups turning on closed captioning because they often find video dialogue difficult to understand. Continue reading CES: Startup Leverages AI to Address Problematic Acoustics
By
Rob ScottNovember 12, 2015
Following announcements that Google is releasing its TensorFlow machine learning platform so developers can create their own artificial intelligence programs, and Nvidia has made a significant update to its Jetson TX1 supercomputer-on-a-chip, Microsoft is the latest with major AI news. The company has updated its Project Oxford suite of AI tools with powerful new features and programs designed to identify human emotions and voices, for example, that could make their way into the apps we use on a daily basis. Continue reading Microsoft Project Oxford Updates Could Bring AI to More Apps
By
Debra KaufmanOctober 28, 2015
Google is now relying on artificial intelligence, with a system dubbed RankBrain, for a small but significant part of its search business. Since Google is identified with search, keeping on the bleeding edge of search technology is critical to its dominance, and Google has been researching artificial intelligence — software that learns about the world — for over five years. Prior to launching RankBrain for search, Google has been a big corporate sponsor of AI, invested in it for videos, speech and translation. Continue reading Google Using RankBrain Artificial Intelligence Tech for Search
By
Meghan CoyleMarch 23, 2015
Researchers from Nanyang Technical University in Singapore have developed a microfiber technology that enables them to build brain-like computers. “Photonic synapses” are collections of microfibers that pass electronic signals. The optical fibers can send signals at the speed of light, much faster than the neurons in real brains. This breakthrough could provide a boost to both robotics and AI technology. Improved vehicle control, speech, and search are just some of the possible applications. Continue reading Breakthrough in AI Technology Mimics Synapses in the Brain