Databricks DBRX Model Offers High Performance at Low Cost
March 29, 2024
Databricks, a San Francisco-based company focused on cloud data and artificial intelligence, has released a generative AI model called DBRX that it says sets new standards for performance and efficiency in the open source category. The mixture-of-experts (MoE) architecture contains 132 billion parameters and was pre-trained on 12T tokens of text and code data. Databricks says it provides the open community and enterprises who want to build their own LLMs with capabilities previously limited to closed model APIs. Compared to other open models, Databricks claims it outperforms alternatives including Llama 2-70B and Mixtral on certain benchmarks.
“While not matching the raw power of OpenAI’s GPT-4, company executives pitched DBRX as a significantly more capable alternative to GPT-3.5 at a small fraction of the cost,” writes VentureBeat.
Likewise, TechCrunch calls DBRX “akin to OpenAI’s GPT series and Google’s Gemini.”
“While foundation models like GPT-4 are great general-purpose tools, Databricks’ business is building custom models for each client that deeply understand their proprietary data. DBRX shows we can deliver on that,” Databricks CEO Ali Ghodsi said at a press event covered by VentureBeat. “We’re excited to share DBRX with the world and drive the industry towards more powerful and efficient open-source AI,” he added.
A distinctive aspect of DBRX is its innovative approach to MoE. Whereas competing models typically use all parameters to generate individual words, DBRX has 16 expert sub-models that dynamically assign the four most relevant for each token. The result is high performance with a relatively modest 36 billion parameters active at any one time, thus faster, more cost-effective operation.
“The Mosaic team, a research unit acquired by Databricks last year, developed this approach based on its earlier Mega-MoE work,” VentureBeat reports, quoting Ghodsi saying the Mosaic team has rapidly improved over the years. “We can build these really good AI models fast — DBRX took about two months and cost around $10 million,” the executive said.
“Training mixture-of-experts models is hard,” Databricks explains in a blog post introducing the new model. “Now that we have done so, we have a one-of-a-kind training stack that allows any enterprise to train world-class MoE foundation models from scratch.”
The weights of the DBRX Base model and the finetuned DBRX Instruct are available on Hugging Face under an open license for research and commercial use. DBRX files are also downloadable on GitHub.
Databricks customers can access DBRX via APIs and “can pretrain their own DBRX-class models from scratch or continue training on top of one of our checkpoints using the same tools and science we used to build it,” Databricks says.
No Comments Yet
You can be the first to comment!
Leave a comment
You must be logged in to post a comment.