Voice Archives - ETCentric

Midjourney Launches V7 Image Generator with Voice Prompts

By Paula Parisi
April 9, 2025

Generative AI program Midjourney has issued V7 in alpha, marking its first new model in almost a year. Notable updates include personalization turned on by default, which users must first set up — a process Midjourney says takes 5 minutes — and can then toggle on or off at any time. Another new flagship feature, Draft Mode, lets users render lower resolution images at “half the cost and 10 times the speed,” according to Midjourney, emphasizing “it’s so fast that we change the prompt bar to a ‘conversational mode’ when you’re using it on Web.” Draft Mode also supports voice prompts. Continue reading Midjourney Launches V7 Image Generator with Voice Prompts

Sam Altman Reveals Plans to Simplify OpenAI’s Product Line

By Paula Parisi
February 14, 2025

OpenAI has decided to simplify its product offerings. A month after announcing the in-development GPT-o3 as its next frontier model, the company has canceled it as a standalone release, explaining that it would be integrated into the upcoming GPT-5 instead. “A top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks,” OpenAI co-founder and CEO Sam Altman wrote in a social media post this week. Expected to ship later this year, the GPT-5 models will incorporate voice, canvas, search, deep research and more, OpenAI says. Continue reading Sam Altman Reveals Plans to Simplify OpenAI’s Product Line

T-Mobile Launches Starlink-Based Mobile Service for Everyone

By Paula Parisi
February 11, 2025

T‑Mobile is acting to eliminate mobile dead zones by launching T-Mobile Starlink, which it says is “the first and only space‑based mobile network in the U.S. that automatically connects to your phone in areas no cellular network reaches.” For now, the service offers SMS text messaging, with “data and voice calls coming later,” according to T-Mobile. The beta is open to everyone, “even Verizon and AT&T customers,” with registration required for free access through July, at which point added fees will kick in for all but those on the T-Mobile Go5G Next plan, on sale now for $150 per month. Continue reading T-Mobile Launches Starlink-Based Mobile Service for Everyone

CES: Google TV Integrates Gemini AI for a Conversational Feel

By Paula Parisi
January 9, 2025

Google TV is incorporating Gemini AI to make it easier to converse with a voice assistant as well as generating helpful onscreen information. These new Google TV devices will also feature an upgraded, Gemini-powered voice experience capable of handling more complex voice commands. “You and your family will be able to gather together and have a natural conversation with your TV,” Google announced at CES 2025, where it shared a preview of the new capabilities. The Gemini model also lets Google TV users create customized artwork, control smart home devices and get an overview of the day’s news. Continue reading CES: Google TV Integrates Gemini AI for a Conversational Feel

OpenAI Announces $200 Monthly Subscription for ChatGPT Pro

By Paula Parisi
December 9, 2024

OpenAI has launched ChatGPT Pro, a $200 per month subscription plan that provides unlimited access to the full version of o1, its new large reasoning model, and all other OpenAI models. The toolkit includes o1-mini, GPT-4o and Advanced Voice. It also includes the new o1 pro mode, “a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems,” OpenAI explains, describing the high-end subscription plan as a path to “research-grade intelligence” for a way for scientists, engineers, enterprise, academics and others who use AI to accelerate productivity. Continue reading OpenAI Announces $200 Monthly Subscription for ChatGPT Pro

Hume AI Introduces Voice Control and Claude Interoperability

By Paula Parisi
December 4, 2024

Artificial voice startup Hume AI has had a busy Q4, introducing Voice Control, a no-code artificial speech interface that gives users control over 10 voice dimensions ranging from “assertiveness” to “buoyancy” and “nasality.” The company also debuted an interface that “creates emotionally intelligent voice interactions” with Anthropic’s foundation model Claude that has prompted one observer to ponder the possibility that keyboards will become a thing of the past when it comes to controlling computers. Both advances expand on Hume’s work with its own foundation model, Empathic Voice Interface 2 (EVI 2), which adds emotional timbre to AI voices. Continue reading Hume AI Introduces Voice Control and Claude Interoperability

Nvidia AI Model Fugatto a Breakthrough in Generative Sound

By Paula Parisi
November 27, 2024

Nvidia has unveiled an AI sound model research project called Fugatto that “can create any combination of music, voices and sounds” based on text and audio inputs. Described by Nvidia as “the world’s most flexible sound machine,” many appear to agree that the new model represents an audio breakthrough, with the potential to generate a wide array of sounds that have not previously existed. While popular sound models from companies including Suno and ElevenLabs “can compose a song or modify a voice, none have the dexterity of the new offering,” Nvidia claims. Continue reading Nvidia AI Model Fugatto a Breakthrough in Generative Sound

DeepL Voice Translates 33 Languages to Captions in Real Time

By Paula Parisi
November 15, 2024

DeepL, a German company that gained a profile with online text translation, has released DeepL Voice, a B2B tool that translates to captions in real time. DeepL Voice debuts in two iterations: DeepL Voice for Meetings, which allows participants to speak in their preferred language while serving colleagues with captions, and DeepL Voice for Conversations, which works on mobile devices, facilitating in-person, one-on-one conversations “with customers, colleagues or anyone else, in the language that works best for them,” the company explains, noting that real-time voice translation offers specific challenges. Continue reading DeepL Voice Translates 33 Languages to Captions in Real Time

Runway’s Act-One Facial Capture Could Be a ‘Game Changer’

By Paula Parisi
October 25, 2024

Runway is launching Act-One motion capture system that uses video and voice recordings to map human facial expressions onto characters using the company’s latest model, Gen-3 Alpha. Runway calls it “a significant step forward in using generative models for expressive live action and animated content.” Compared to past facial capture techniques — which typically require complex rigging — Act-One is driven directly and only by the performance of an actor, requiring “no extra equipment,” making it more likely to capture and preserve an authentic, nuanced performance, according to the company. Continue reading Runway’s Act-One Facial Capture Could Be a ‘Game Changer’

Microsoft’s Copilot AI Assistant Update Adds Voice and Vision

By Paula Parisi
October 4, 2024

Microsoft announced that its Copilot AI assistant has received a major overhaul, gaining voice and vision capabilities. Copilot also now has a virtual news reader mode to present headlines, as well as the ability to see what you see and to interact in a more conversational manner. Before a general release, these tools will be trialed among a subset of Copilot Pro users “to gather feedback” and make them “better and safer.” Microsoft AI Executive VP and CEO Mustafa Suleyman says the changes herald “a calmer, more helpful and supportive era of technology, quite unlike anything we’ve seen before.” Continue reading Microsoft’s Copilot AI Assistant Update Adds Voice and Vision

Amazon Is Inviting Audible Narrators to Create AI Voice Clones

By Paula Parisi
September 12, 2024

Amazon is aiming to speed up production of its Audible audiobooks by inviting a small group of narrators to clone their voices using generative artificial intelligence. The U.S. beta test will roll out later this year according to Amazon, which announced the move on Audible’s creator marketplace. “There is a vast catalog of books that does not yet exist in audio and as we explore ways to bring more books to life on Audible, we’re committed to thoughtfully balancing the interests of authors, narrators, publishers, and listeners,” Amazon explains. Continue reading Amazon Is Inviting Audible Narrators to Create AI Voice Clones

ElevenLabs Reader App Is Available Globally in 32 Languages

By Paula Parisi
August 29, 2024

New York-based ElevenLabs is going global with its generative AI text-to-speech reader app, which can narrate writings in 32 languages with thousands of voices from which to choose. The audio startup promises “high quality, human-like” AI voices that are “emotionally and contextually aware,” adapting delivery of written cues “to achieve a high emotional range.” ElevenLabs has focused on “creative workflow,” with a voice isolator and audio effects generator tools. Its catalog includes the voices of celebrities Judy Garland, Laurence Olivier, James Dean and Burt Reynolds. Custom models for translation and voiceover work using contemporary actors is a future possibility. Continue reading ElevenLabs Reader App Is Available Globally in 32 Languages

SAG-AFTRA Strikes a Deal with Narrativ for AI Voice Replicas

By Paula Parisi
August 23, 2024

SAG-AFTRA announced it is teaming with online talent marketplace Narrativ to provide the guild’s 160,000 members with the option of working with the New York-based AI startup to license their voice replicas for use in digital audio advertising. The deal would make it easy for voice actors to be considered for replicant work and get compensated, according to SAG-AFTRA, which emphasizes that performers will control the particulars, including whether to make their voices available, brand approval and fees. Narrativ also represents visual likenesses, but the SAG-AFTRA announcement is limited to voice work. Continue reading SAG-AFTRA Strikes a Deal with Narrativ for AI Voice Replicas

D-ID Employs AI to Translate Videos into Multiple Languages

By Paula Parisi
August 23, 2024

D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation. Continue reading D-ID Employs AI to Translate Videos into Multiple Languages

Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice

By Paula Parisi
August 20, 2024

Google has released its AI assistant, Gemini Live, and is positioning it to replace Google Assistant on mobile. Gemini Live is rolling out on Android to subscribers of Gemini Advanced, which is part of the $20 monthly Google One AI Premium plan. Those consumers who purchase the new Pixel 9 Pro — which begins shipping this week — will get the assistant as part of a year of free access to Gemini Advanced, a $240 value, according to the company. Google claims that Gemini Live technology enables natural, flowing conversations with the AI assistant, putting “a sidekick in your pocket.” Continue reading Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice