OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus
August 5, 2024
OpenAI has released its new Advanced Voice Mode in a limited alpha rollout for select ChatGPT Plus users. The feature, which is being implemented for the ChatGPT mobile app on Android and iOS, aims for more natural dialogue with the AI chatbot. Powered by GPT-4o, which is multimodal, Advanced Voice Mode is said to be able to sense emotional inflections, including excitement, sadness or singing. According to an OpenAI post on X, the company plans to “continue to add more people on a rolling basis” so that everyone using ChatGPT Plus will have access to the new feature in the fall.
“When OpenAI first showcased GPT-4o’s voice in May, the feature shocked audiences with quick responses and an uncanny resemblance to a real human’s voice,” TechCrunch reports, calling the new audio “hyperrealistic.”
It’s different from the existing Voice Mode in that “ChatGPT’s old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT’s text into voice,” but now uses only GPT-4o to process all three tasks, “without the help of auxiliary models, creating significantly lower latency conversations,” TechCrunch notes.
OpenAI is initiating the Advanced Voice Mode rollout “after previously delaying the release to work through potential safety issues,” Bloomberg explains. There was also a rights skirmish involving the sound of AVM involving the actress Scarlett Johansson (as discussed in TechCrunch).
According to VentureBeat, the voice Johansson objected to as too similar to her own has been “pulled from its library and it remains offline to this day.”
“The product will offer four preset voices, but won’t be able to impersonate how other people speak,” Bloomberg writes, adding that OpenAI says it “added new filters to ensure the software can spot and refuse some requests to generate music or other forms of copyrighted audio.”
OpenAI’s X post describes the naturalistic approach to Advanced Voice, including accommodating interruptions.
Due to the staggered rollout, Advanced Voice Mode will begin with limited capabilities, gradually adding more. For example, Bloomberg notes “the chatbot won’t be able to access a computer vision feature that would let it offer spoken feedback on a person’s dance moves simply by using their smartphone’s camera.”
VentureBeat explains that ChatGPT Plus is “the $20 per month individual subscription service OpenAI offers for access to its signature large language model (LLM)-powered chatbot, alongside other tiers Free, Team, Enterprise.”
Related:
‘Uncanny’: ChatGPT’s Advanced Voice Mode Is Blowing Minds, VentureBeat, 8/1/24
ChatGPT Advanced Voice Mode Impresses Testers with Sound Effects, Catching Its Breath, Ars Technica, 7/31/24
No Comments Yet
You can be the first to comment!
Leave a comment
You must be logged in to post a comment.