OpenAI Introduces New Models That Can Reason with Images

OpenAI has released two new AI models that use images as part of their reasoning process, “thinking with images.” OpenAI o3 and o4-mini “are the smartest models we’ve released to date, representing a step change in ChatGPT’s capabilities for everyone from curious users to advanced researchers,” the company says. The new entries in the “o” series also have agentic capabilities and can independently “use and combine every tool within ChatGPT, including searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images.”

At a press event for the new models, OpenAI President Greg Brockman called them “a qualitative step into the future,” adding that they are “the first models where top scientists tell us they produce legitimately good and useful novel ideas,” reports VentureBeat, which calls the newcomers “groundbreaking.”

VentureBeat calls the ability to “‘think with images’ — not just see them, but manipulate and reason about them as part of their problem-solving process,” the most impactful feature of the new models.

VB cites an example in which “o3 could analyze a physics poster from a decade-old internship, navigate its complex diagrams independently, and even identify that the final result wasn’t present in the poster itself.”

OpenAI o3 and o4-mini “are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems,” OpenAI explains in an announcement. This allows them to tackle multi-faceted questions more effectively, a step toward “a more agentic ChatGPT that can independently execute tasks on your behalf.”

A separate post from OpenAI notes the new models can “crop, zoom in, and rotate” images, in addition to other processing techniques as part of its chain of thought.

The New York Times reports the technology builds on prior advances in AI reasoning by enabling AI to use “images, including sketches, posters, diagrams and graphs” to answer questions.

“The new models are part of OpenAI’s effort to beat out Google, Meta, xAI, Anthropic and DeepSeek in the cutthroat global AI race,” TechCrunch points out. noting that

“While OpenAI was first to release an AI reasoning model, o1, competitors quickly followed with versions of their own that match or exceed the performance of OpenAI’s lineup,” TechCrunch adds. “Reasoning models have begun to dominate the field as AI labs look to eke more performance out of their systems.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.