Transcription Archives

OpenAI Pushes Conversational Agents with Three New Models

By Paula Parisi
March 24, 2025

OpenAI has debuted three new models for transcription and voice generation — gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts. The text-to-speech and speech-to-text AI models are designed to help developers create AI agents with highly customizable voices. OpenAI claims these models will power natural and responsive voice agents, moving AI out of the text-based communications stage and into intuitive spoken conversations. The suite outperforms existing solutions in accuracy and reliability, OpenAI says, especially with “accents, noisy environments, and varying speech speeds,” making them well-suited for customer call centers and meeting notes. Continue reading OpenAI Pushes Conversational Agents with Three New Models

Meta Adds Indigenous Languages to Speech and Translation AI

By Paula Parisi
February 11, 2025

Meta is seeking to make AI more inclusive with a program to support underserved languages “and help bring their speakers into the digital conversation.” Meta’s Fundamental AI Research (FAIR) unit has teamed with UNESCO to launch the Language Technology Partner Program, which is looking for people who can provide more than 10 hours of speech recordings (with transcriptions) and chunks of written text (200+ sentences, with translation) in diverse languages. “Partners will work with our teams to help integrate these languages into AI-driven speech recognition and machine translation models, which when released will be open sourced,” Meta said. Continue reading Meta Adds Indigenous Languages to Speech and Translation AI

Microsoft Cloud Buoys Quarterly Revenue to Nearly $62 Billion

By ETCentric Staff
April 29, 2024

Microsoft revenue was $61.9 billion in the quarter ending March 31, up 17 percent compared to the same period a year ago. Profit was up 20 percent, to $21.9 billion, despite an increase in capital expenditure to purchase Nvidia GPUs for training and running AI models. The performance smashed analyst predictions, sending the stock up 5 percent in after-hours trading. Revenue for the Microsoft Cloud division overall was $35.1 billion, up 23 percent year-over-year, fueled largely by customers using it to host resource intensive AI services. Revenue in the Intelligent Cloud sector was $26.7 billion, a 21 percent uptick. Continue reading Microsoft Cloud Buoys Quarterly Revenue to Nearly $62 Billion

Otter Adds New Generative AI Features to Its Meeting Assistant

By ETCentric Staff
February 15, 2024

Web-based transcription service Otter.ai is expanding its toolkit with Meeting GenAI, aimed at corporate customers who want to increase meeting productivity while decreasing effort. Multi-meeting capabilities have been added using Otter AI Chat, which can respond to queries like “What did I miss in the meetings from the past two weeks?” Conversation Summary View summarizes meetings in real-time along with automatically identified action items that are assigned owners, deadlines and tracking. Otter is positioning itself as a David versus the Goliaths of AI meeting assists: Microsoft Copilot, Zoom AI Companion and Google’s Gemini for Workspace. Continue reading Otter Adds New Generative AI Features to Its Meeting Assistant

Meta’s Multimodal AI Model Translates Nearly 100 Languages

By Paula Parisi
August 24, 2023

Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages

OpenAI Rolls Out Open-Source Speech Recognition System

By Paula Parisi
September 26, 2022

OpenAI has released a new open source AI speech recognition model called Whisper that can recognize and translate audio at levels it says compare in accuracy and robustness to human abilities. Case uses include transcription of speeches, interviews, podcasts and conversations. “Moreover, it enables transcription in multiple languages, as well as translation from those languages into English,” says OpenAI, which is open-sourcing models and inference code on GitHub “to serve as a foundation for building useful applications and for further research on robust speech processing.” Continue reading OpenAI Rolls Out Open-Source Speech Recognition System

Twitter Intros Ephemeral Tweets, Gathering Spaces for Audio

By Debra Kaufman
November 19, 2020

Twitter is launching Fleets, a feature that allows users to post photos or text that will disappear after 24 hours. Snapchat pioneered the ephemeral post, followed by Instagram and Facebook. Rollout of the Stories-like feature is moving forward, but has been scaled back as Twitter addresses “some performance and stability problems.” The platform’s “global town square” continues to be its “marquee product” but, said Twitter director of design Joshua Harris, the Fleets feature creates a space with less pressure for users who lurk but don’t post. The company is also testing Spaces, a new audio feature similar to Clubhouse, a startup that debuted earlier this year. Continue reading Twitter Intros Ephemeral Tweets, Gathering Spaces for Audio

Google Open-Sources Technology For Real-Time Captions

By Rob Scott
August 20, 2019

Google is looking to help developers create real-time captioning for long-form conversations in multiple languages. The company recently open-sourced the speech engine used for Live Transcribe, its Android speech-to-text transcription app designed for those who are deaf or hard of hearing, and posted the source code on GitHub. Live Transcribe, launched in February, is a tool that uses machine learning algorithms to convert audio into captions. Live Transcribe can transcribe speech in more than 70 languages and dialects into captions in real-time. Continue reading Google Open-Sources Technology For Real-Time Captions

Publishers and Authors Guild Oppose Audible Text Feature

By Debra Kaufman
July 23, 2019

Audible, the audiobook app owned by Amazon, is using machine learning to transcribe audio recordings, so listeners can also read along with the narrator. Audible is promoting it as an educational feature, but some publishers are up in arms, demanding their books be excluded because captions are “unauthorized and brazen infringements of the rights of authors and publishers.” Publishers are concerned that this will lead to fewer people buying physical or e-books if they can get the text with an Audible audiobook. Continue reading Publishers and Authors Guild Oppose Audible Text Feature

Android Q Live Caption Feature Enables Real-Time Subtitles

By Emily Wilson
May 9, 2019

During Google’s I/O 2019 developers conference this week, the company demonstrated an impressive new feature for mobile operating system Android Q. Called Live Caption, the feature enables real-time transcription for any video or audio that users play on their smartphones. No matter if they’re listening or watching via YouTube, Skype, Instagram, Pocket Casts, or other applications, Live Caption overlays the text on top of whatever is being used. Additionally, Live Caption will work on top of original video or audio recordings on users’ phones.

Continue reading Android Q Live Caption Feature Enables Real-Time Subtitles

AI Firm Shows Multilingual Translator That Fits in Your Pocket

By Phil Lelyveld
January 10, 2019

The iFLYTEK Translator 2.0 is a handheld spoken language translator developed with Chinese AI technology and training. The size of a mobile phone, it can translate between any two of 63 languages and is trained in a number of “professional vocabularies.” The device touts a 5-hour battery life, and at $450, would be a useful and affordable business and personal tool. This Chinese tech also raises some interesting privacy and geopolitical issues. In addition to the upgraded Translator 2.0, the company also announced its iFLYREC Series voice-to-text products, AI Note for recording and transcription, and iFLYOS voice-interaction system at CES. Continue reading AI Firm Shows Multilingual Translator That Fits in Your Pocket

NAB 2018: IBM Watson on Refining AI for Closed Captioning

By Debra Kaufman
April 13, 2018

Closed captioning isn’t just for the hard-of-hearing anymore. According to Digiday, 85 percent of Facebook video is viewed without sound. That signals a trend of viewers who prefer to watch closed captioning, putting the heat on solutions providers to come up with compliant systems that are also accurate and speedy. With artificial intelligence, says IBM Watson Media senior offering manager David Kulczar, closed captioning can be enhanced to go beyond transcription, and automatically identify background audio descriptions. Continue reading NAB 2018: IBM Watson on Refining AI for Closed Captioning

Facebook Messenger Will Roll Out Voice-to-Text Capabilities

By Meghan Coyle
January 22, 2015

Facebook will continue to improve its Messenger app this year. The standalone app already has more than 500 million monthly users, but the company is hoping to get to a billion users by the end of the year. One attractive new feature will be the voice-to-text transcription. A release date has yet to be announced, but the company is already testing it. Also, Facebook will experiment with ways to generate revenue and give people a way to communicate with businesses on the Messenger app. Continue reading Facebook Messenger Will Roll Out Voice-to-Text Capabilities