Mistral, Nvidia Bring Enterprise AI to Desktop with NeMo 12B

By Paula Parisi
July 24, 2024

Nvidia and French startup Mistral AI are jointly releasing a new language model called Mistral NeMo 12B that brings enterprise AI capabilities to the desktop without the need for major cloud resources. Developers can easily customize and deploy the new LLM for applications supporting chatbots, multilingual tasks, coding and summarization, according to Nvidia. “NeMo 12B offers a large context window of up to 128k tokens,” explains Mistral, adding that “its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category.” Available under the Apache 2.0 license, it is easy to implement as a drop-in replacement for Mistral 7B.

Mistral NeMo 12B has 12 billion parameters and a generous 128,000 token context window, making it a formidable enterprise tool with a light footprint. As such, it “represents a significant shift in the AI industry’s approach to enterprise solutions,” reports VentureBeat, adding that by focusing on “a more compact yet powerful model,” the collaboration between the U.S. chipmaker and Paris-based AI firm “aims to democratize access to advanced AI capabilities.”

According to Nvidia, the new LLM excels in “multi-turn conversations, math, common sense reasoning, world knowledge and coding” and is reliable across diverse tasks.

Mistral NeMo leverages Nvidia’s “top-tier hardware and software,” Mistral co-founder and Chief Scientist Guillaume Lample said of the partnership in a Nvidia blog post. “Together, we have developed a model with unprecedented accuracy, flexibility, high-efficiency and enterprise-grade support and security.”

Mistral NeMo was trained on the Nvidia DGX Cloud AI platform, which offers dedicated, scalable access to the latest Nvidia architecture. Nvidia’s NeMo generative AI development platform was also pivotal in building the new model. Mistral contributed “deep expertise in training data,” according to Forbes.

“The model is packaged as an Nvidia NIM inference microservice, offering performance-optimized inference with TensorRT-LLM engines, allowing for deployment anywhere in minutes,” Forbes adds, explaining that “this containerized format ensures enhanced flexibility and ease of use for various applications.”

The model uses the FP8 data format for model inference, reducing memory size hastening deployment “without any degradation to accuracy,” Nvidia claims.

A small LLM such as Mistral NeMo can even be run in homes on personal computing systems, Nvidia VP of Applied and Deep Learning Bryan Catanzaro told VentureBeat, adding that “this model can run on RTX GPUs that many people have already.”

A Mistral blog post calls it a “multilingual model for the masses,” pointing out that it is “particularly strong” in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic and Hindi.

Mistral, Nvidia Bring Enterprise AI to Desktop with NeMo 12B

No Comments Yet

Leave a comment