Reasoning Model Competes with Advanced AI at a Lower Cost
February 10, 2025
Model training continues to hit new lows in terms of cost, a phenomenon known as the commoditization of AI that has rocked Wall Street. An AI reasoning model created for under $50 in cloud compute credits is reportedly performing comparably to established reasoning models such as OpenAI o1 and DeepSeek-R1 on tests of math and coding aptitude. Called s1-32B, it was created by researchers at Stanford and the University of Washington by customizing Alibaba’s Qwen2.5-32B-Instruct, feeding it 1,000 prompts with responses sourced from Google’s new Gemini 2.0 Flash Thinking Experimental reasoning model.
“Instead of simply answering the user’s prompt, Gemini Thinking Experimental displays the thought process that led to its response,” SiliconANGLE says, explaining that the Gemini model “provides a natural-language summary of each step in its thought process. Those summaries were added into s1-32B’s training dataset alongside the 1,000 sample prompts and the corresponding AI-generated answers.”
SiliconANGLE summarizes the processes detailed in the s1 researchers’ arXiv technical paper, including a new machine learning method called “budget forcing,” which helped the LLM achieve “the first publicly disclosed successful attempt at replicating ‘clear test-time scaling behavior.’”
Test-time scaling refers to techniques that improve a model’s performance by increasing the amount of computation it performs when generating a response — also known as inference compute — rather than solely relying on improvements made during the training phase. It allows the model from the process of reasoning through its own response.
So far, OpenAI o1 is the only model publicly disclosed to have implemented test-time scaling, and it is that model’s achievement that s1-32B replicated. But s1-32B’s developers say that it “is the most sample-efficient reasoning model and outperforms closed-source models like OpenAI’s o1-preview.”
Supervised fine-tuning, or SFT, was another helpful technique, allowing reasoning models to be “distilled with a relatively small dataset” that is less expensive than the “large-scale reinforcement learning method that DeepSeek employed to train R1, its competitor to OpenAI o1, TechCrunch explains.
Training s1 “took less than 30 minutes” using hardware a Stanford researcher told TechCrunch would cost “about $20” to rent. The resulting s1 model is available to explore on GitHub, along with its training code and data.
Comparing that to the hundreds of billions of dollars Meta, Google and Microsoft plan to invest in AI this year, TechCrunch posits “that level of investment may still be necessary to push the envelope of AI innovation,” writing that while the processes employed for s1 have been “shown to be a good method for cheaply re-creating an AI model’s capabilities,” it hasn’t yet been demonstrated that they can “create new AI models vastly better than what’s available today.”
No Comments Yet
You can be the first to comment!
Leave a comment
You must be logged in to post a comment.