Hugging Face Has Developed Tiny Yet Powerful Vision Models

Most people know Hugging Face as a resource-sharing community, but it also builds open-source applications and tools for machine learning. Its recent release of vision-language models small enough to run on smartphones while outperforming competitors that rely on massive data centers is being hailed as “a remarkable breakthrough in AI.”  The new models — SmolVLM-256M and SmolVLM-500M — are optimized for “constrained devices” with less than around 1GB of RAM, making them ideal for mobile devices including laptops and also convenient for those interested in processing large amounts of data cheaply and with a low-energy footprint.

“The company’s new SmolVLM-256M model surpasses the performance of its Idefics 80B model from just 17 months ago — a system 300 times larger,” VentureBeat writes, calling the dramatic reduction in size and performance enhancement “a watershed moment for practical AI deployment.”

Hugging Face Machine Learning Research Engineer Andrés Marafioti tells VentureBeat that when the New York-based company released Idefics 80B in August 2023 it was the first to open-source a video language model, emphasizing that “by achieving a 300x size reduction while improving performance, SmolVLM marks a breakthrough in vision-language models.”

The breakthrough comes at a time when enterprises are struggling to implement efficient AI systems. Available in 256 million and 500 million parameter sizes, SmolVLM-256M and SmolVLM-500M punch above their weight class, processing images and analyzing visual content at speeds previously requiring much larger models.

TechCrunch reports the small models “may be inexpensive and versatile, but can also contain flaws that aren’t as pronounced in larger models,” citing a recent study from Google DeepMind, Microsoft Research, and Quebec’s Mila research institute that found “many small models perform worse than expected on complex reasoning tasks.”

Hugging Face shares SmolVLM’s specs and benchmarks in a blog post. The company also announced that it has integrated four new serverless inference providers into its central platform, the Hub. “They are also seamlessly integrated into our client SDKs, making it easier than ever to explore serverless inference of a wide variety of models that run on your favorite providers.”

On a related note, TechRadar reports that the open source DeepSeek model from China that took the world by storm in the preceding weeks is now available in 3,374 variations on Hugging Face, which it calls a “collaborative AI-model development platform.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.