OpenAI Showcases Latest Updates for Voice, Picture and More

OpenAI unveiled major updates at its DevDay conference with the focus largely on making AI more accessible, efficient and affordable. Included were four innovations: Vision Fine-Tuning in the API, Model Distillation, Prompt Caching and the public beta of Realtime API. The approach underscores OpenAI’s effort to empower its developer ecosystem even as it continues to compete for end-users in the enterprise space. The Realtime API gives developers the option of building “nearly real-time” speech-to-speech app experiences, selecting from among six OpenAI voices. Vision Fine-Tuning for GPT-4o enables customization of the model’s visual understanding of images and text.

VentureBeat calls Prompt Caching “a boon for developers” that could potentially reduce costs and latency. “By reusing recently seen input tokens, developers can get a 50 percent discount and faster prompt processing times,” OpenAI explains in a Prompt Caching blog post that includes a pricing chart.

OpenAI Head of Product Olivier Godement said at a press event reported on by VentureBeat that “just two years ago, GPT-3 was winning. Now, we’ve reduced [those] costs by almost 1000x. I was trying to come up with an example of technologies who reduced their costs by almost 1000x in two years — and I cannot come up with an example.”

Voice experiences have gotten a lot of attention lately, “from language apps and educational software to customer support experiences,” OpenAI points out in a blog post about Realtime API.

Now “developers no longer have to stitch together multiple models to power these experiences,” building natural conversational experiences with a single API call, thanks to Realtime API, and “soon with audio in the Chat Completions API,” the post explains.

The Realtime API voices “are distinct from those offered for ChatGPT, and developers can’t use third party voices, in order to prevent copyright issues,” TechCrunch notes.

The Model Distillation process promises the ability to fine-tune “smaller, cost-efficient models using outputs from more capable models, allowing them to match the performance of advanced models on specific tasks at a much lower cost,” OpenAI says.

VentureBeat calls Model Distillation “perhaps the most transformative announcement” of DevDay. “This integrated workflow allows developers to use outputs from advanced models like o1-preview and GPT-4o to improve the performance of more efficient models such as GPT-4o Mini,” VentureBeat reports, noting it “addresses a long-standing divide in the AI industry between cutting-edge, resource-intensive systems and their more accessible but less capable counterparts,” making high-end computation available to smaller companies.

While adding vision fine-tuning to the API lets developers use images, in addition to text, when fine-tuning GPT-4o apps — which will prove useful for things like gaming and other interactive entertainment (as well as visual search, autonomous vehicle navigation, smart cities and medical analysis) — TechCrunch writes that “developers will not be able to upload copyrighted imagery (such as a picture of Donald Duck), images that depict violence, or other imagery that violates OpenAI’s safety policies.”

Related:
OpenAI Speeds Ahead as It Shifts to Non-Profit, Axios, 10/2/24

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.