By
ETCentric StaffMarch 21, 2024
Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura
By
Paula ParisiSeptember 27, 2023
OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search
By
Paula ParisiAugust 24, 2023
Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages
By
Paula ParisiMarch 6, 2023
OpenAI is now allowing third-party developers integrate ChatGPT into their apps, a solution the company says will be a more cost-effective alternative. The language model can be used for more than chat, says OpenAI, which also has a new speech-to-text model called Whisper. The company is also touting gpt-3.5-turbo, calling it the “best model for many non-chat use cases.” With a major investment from Microsoft, and the eyes of the industry on it, OpenAI seems to be feeling some pressure to add earnings to the success it has as a thought leader. Continue reading OpenAI Targets Affordable AI with ChatGPT and Whisper APIs
By
Debra KaufmanApril 13, 2018
Closed captioning isn’t just for the hard-of-hearing anymore. According to Digiday, 85 percent of Facebook video is viewed without sound. That signals a trend of viewers who prefer to watch closed captioning, putting the heat on solutions providers to come up with compliant systems that are also accurate and speedy. With artificial intelligence, says IBM Watson Media senior offering manager David Kulczar, closed captioning can be enhanced to go beyond transcription, and automatically identify background audio descriptions. Continue reading NAB 2018: IBM Watson on Refining AI for Closed Captioning
By
Debra KaufmanDecember 5, 2017
Mozilla unveiled Project DeepSpeech and Project Common Voice to leverage the capabilities of speech recognition. The company says it has just reached “two important milestones” in the project out of its Machine Learning Group. Mozilla is releasing its open source speech recognition model, which it states is nearly as accurate as what humans can perceive from the same recordings, and is also unveiling the world’s second largest publicly available voice dataset, with contributions by almost 20,000 people around the world. Continue reading Mozilla Intros Open-Source Speech Recognition, Voice Dataset
By
Debra KaufmanApril 10, 2017
Apple is debuting a standalone video app called Apple Clips that allows users to shoot, edit and share video clips for mobile phones. Apple Clips, for iOS 10.3 or higher, features real-time captioning and facial recognition as well as giant emoji, cartoon filters and lively title screens — and the end results can be distributed to iMessage contacts. Automatic captioning, dubbed Live Titles, allows the user to choose a font and style; after hitting record, the app transcribes speech to text. But less ideal features mar the app, say critics. Continue reading Apple Clips Launches: Cool Features, But Not Always Intuitive
By
Debra KaufmanJuly 21, 2016
Microsoft introduced Stream, a service that will allow businesses the ability to share internal video easily and securely. Now available as a free preview, Stream offers the same easy-to-use, flexible tools as YouTube, but with security tools for enterprise content. Office 365 already has a Video tool, and Microsoft’s idea is to eventually and seamlessly merge the two services. Unlike Office 365, Stream will make use of tools — including likes, comments, and recommendations — found in consumer platforms such as Vimeo and YouTube. Continue reading Microsoft Stream Offers Familiar Video Tools for Businesses
As the battle heats up with tech companies over artificial intelligence and digital assistants, SoundHound released an app this week called “Hound” that promises to enhance voice search with its ability to quickly and efficiently handle complex questions. According to Keyvan Mohajer, SoundHound founder and chief exec, Hound has a leg up on the competition since it performs voice recognition and natural-language processing in a single step, as opposed to translating speech to text and then performing a search using that text. Continue reading New Hound App Could Prove Rival to Siri, Cortana, Google Now