OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

OpenAI has unveiled a new frontier model, OpenAI o3, which it claims can “reason” through challenges involving math, science and computer programming. Available to safety and research testers, it is expected to be available to individuals and businesses this year. OpenAI o3 is said to be over 20 percent more efficient at common programming tasks than its predecessor OpenAI o1 and beat a company scientist on a programming test. Model o3 is part of a broader effort to create AI systems that can reason through complex problems. In late December Google debuted a similar platform, the experimental Gemini 2.0 Flash Thinking Mode.

“These two companies and others aim to build systems that can carefully and logically solve a problem through a series of steps, each one building on the last,” The New York Times reports, noting the models “could be useful to computer programmers who use AI systems to write code or to students seeking help from automated tutors in areas like math and science.”

“This model is incredible at programming,” OpenAI CEO Sam Altman said during an online presentation introducing OpenAI o3 and o3-mini. Following banter about how OpenAI is bad with naming protocols and “should have named it o2,” Altman calls o3 “a very, very smart model,” and says o3-mini is “an incredibly smart model, really good at performance and cost.”

VentureBeat describes how “o3 scored 75.7 percent on the ARC benchmark under standard compute conditions and 87.5 percent using high compute, significantly surpassing previous state-of-the-art results, such as the 53 percent scored by Claude 3.5.”

The ARC benchmark was created by renowned AI researcher François Chollet who designed it to measure the ability to handle novel, intelligent tasks. It is considered “a meaningful gauge of progress toward truly intelligent AI systems,” says VentureBeat, noting that Chollet himself considered the performance “a surprising advancement” that challenged his belief that large language models were unlikely to achieve reasoning intelligence.

“It highlights innovations that could accelerate progress toward superior intelligence, whether we call it artificial general intelligence (AGI) or not,” VentureBeat writes.

Bolstering that view, o3 model scored “25 percent on a difficult math test that no other AI model scored more than 2 percent on,” explains TechCrunch.While some informed observers believed AI had hit a scaling wall, o3 may provide renewed hope, but the downside is to achieve its high performance.

“OpenAI’s newest model also uses a previously unseen level of compute, which means a higher price per answer,” reports TechCrunch.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.