OpenAI Delivers Native GPT-4o Image Generator to ChatGPT

OpenAI has activated the multimodal image generation capabilities of GPT-4o, making it available to ChatGPT users on the Plus, Pro, Team and Free tiers. It replaces DALL-E 3 as the default image generator for the popular chatbot. GPT-4o’s accuracy with text, understanding of symbols and precision with prompts combined with well multimodal capabilities that allow the model to take cues from visual material have transformed its image capabilities from largely unpredictable to “consistent and context-aware,” resulting in “a practical tool with precision and power,” claims OpenAI.

This updated version of GPT-4o “can generate more realistic images,” writes The Wall Street Journal. The improvement is “the result of a year-long effort with human trainers” who painstakingly labeled the training data, “pointing out where typos, errant hands and faces had been made in AI-generated images,” the project’s lead researcher Gabriel Goh told the outlet.

“We trained our models on the joint distribution of online images and text, learning not just how images relate to language, but how they relate to each other,” OpenAI wrote in a blog post, explaining this was combined with aggressive post-training that resulted in a model with “surprising visual fluency.”

“While previous models had difficulty correctly positioning many distinct objects in a scene, GPT-4o can now handle up to 10-20 objects at once,” called “multi-object binding,” reports VentureBeat. Accurate text integration, the ability to transform styles, and chat history retention that informs multiple iterations are other improvements over DALL-E 3.

Because the imager draws on everything GPT‑4o knows, the images can draw from factual information, “creating images that are not only beautiful, but also useful,” OpenAI explains in a technical paper on native imaging for GPT-4o.

The new technology is “designed to generate images from detailed, complex and unusual instructions,” reports The New York Times. “If you describe a four-panel comic strip, including the characters who appear in each panel and what they are saying to one another, the technology can instantly generate an elaborate cartoon.”

DALL-E 3 would not have been able to reliably execute that wide array of concepts, NYT points out.

OpenAI CEO Sam Altman discusses the improvements with his team in a video post that shows them generating images in real time.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.