Yasa-1: Startup Reka Launches New AI Multimodal Assistant

Startup Reka AI is releasing in preview its first artificial intelligence assistant, Yasa-1. The multimodal AI is described as “a language assistant with visual and auditory sensors.” The year-old company says it “trained Yasa-1 from scratch,” including pretraining foundation models “from ground zero,” then aligning them and optimizing to its training and server infrastructures. “Yasa-1 is not just a text assistant, it also understands images, short videos and audio (yes, sounds too),” said Reka AI co-founder and Chief Scientist Yi Tay. Yasa-1 is available via Reka’s APIs and as docker containers for on-site or virtual private cloud deployment.

In the coming weeks, Reka plans to expand access to more enterprise and organization partners, the company explains in a blog post that provides instructions for how to use Yasa-1 and provides contact info to request a demo.

Reka says “Yasa-1 has the core capabilities of a typical text-based AI assistant. It can be used as a general-purpose language model for a plethora of text-only tasks,” such as generating ideas for creative projects or summarizing documents. It also supports images, audio and short video clips as input prompts.

The Reka.ai team “spent substantial time natively optimizing for retrieval, adding a code interpreter, investing in multilinguality and supporting live data via search engine integration,” Tay posted on the X social platform, noting the company also “spent significant time investing in our own internal evaluation framework.”

Marktechpost calls Yasa-1 “groundbreaking” in that it is “designed to bridge the gap between traditional text-based AI and the real world, where information is not confined to words alone.” Such multimodal capabilities are the new frontier for AI assistants, which have exploded over the past year, with companies rushing to release their own versions or incorporate existing models into their corporate workflows as well as consumer-facing products.

The challenge of creating “a genuinely multimodal AI that can seamlessly comprehend text and interact with visual and auditory inputs” has long been a priority for AI developers, Marktechpost writes. OpenAI also recently announced upgrades to make ChatGPT multimodal.

Yasa-1 “can be customized on private datasets of any modality, allowing enterprises to build new experiences for a myriad of use cases,” VentureBeat reports, noting it “supports 20 different languages and also brings the ability to provide answers with context from the Internet, process long context documents and execute code.”

The company emphasized that this is only the first version of Yasa, which it says will be improved with updates over the coming months. For now, its “ability to discern intricate details in multimodal media is limited,” and audio or video clips no longer than one minute are recommended “for the best experience.”

Founded by former researchers from DeepMind, Google and Meta Platforms, Reka came out of stealth mode in June with news of $58 million in Series A funding.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.