Meta Unveils New Open-Source Multimodal Model Llama 3.2

Meta’s Llama 3.2 release includes two new multimodal LLMs, one with 11 billion parameters and one with 90 billion — considered small- and medium-sized — and two lightweight, text-only models (1B and 3B) that fit onto edge and mobile devices. Included are pre-trained and instruction-tuned versions. In addition to text, the multimodal models can interpret images, supporting apps that require visual understanding. Meta says the models are free and open source. Alongside them, the company is releasing “the first official Llama Stack distributions,” enabling “turnkey deployment” with integrated safety.

Meta CEO Mark Zuckerberg unveiled the Llama 3.2 models at the Meta Connect conference in response to the fact that many developers did not have the compute resources or expertise to run the Llama 3.1 models released two months ago.

VentureBeat says the unified interface provided by Meta’s Llama Stack “for tasks such as fine-tuning, synthetic data generation and agentic application building” can make the 3.2 models an option for “organizations looking to leverage AI without extensive in-house expertise.”

“In most of the world, the multimodal Llama models can be downloaded from and used across a wide number of cloud platforms, including Hugging Face, Microsoft Azure, Google Cloud, and AWS,” reports TechCrunch, noting that Meta is not releasing them in the EU, where Meta is embroiled in regulatory battles.

The new models are also hosted on Llama.com. “We continue to share our work because we believe openness drives innovation and is good for developers, Meta and the world,” according to the company.

In a blog post, Meta says it’s been “working closely” with partners including AWS, Databricks, Dell Technologies, Fireworks, Infosys and Together AI to build Llama Stack distributions for their downstream enterprise clients. Meta is itself using the Llama 3.2 models “to power its AI assistant, Meta AI, across WhatsApp, Instagram and Facebook,” TechCrunch notes.

The multimodal 11B and 90B models “can be deployed with or without a new safety tool, Llama Guard Vision, that’s designed to detect potentially harmful (i.e. biased or toxic) text and images fed to or generated by the models,” according to TechCrunch.

The made-for-mobile Llama 3.2 1B and 3B models support context length of 128K tokens and “are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors,” according to Meta. Because the lightweight models are run locally, “prompts and responses can feel instantaneous” while also being more private.

“Open source is going to be — already is — the most cost-effective customizable, trustworthy and performant option out there,” VentureBeat quotes Zuckerberg saying in his Connect keynote, calling this “an inflection point in the industry” where open source is “starting to become an industry standard, call it the Linux of AI.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.