Cohere’s Multimodal Embed Model Organizes Enterprise Data

As enterprises rely more heavily on AI integration to compile research and summarize things like meetings and email threads, the need for contextual search has become increasingly important. AI startup Cohere has released Embed 4 to make the task easier. Embed 4 is a multimodal embedding model that transforms text, images and mixed data (like PDFs, slides or tables) into numerical representations (or “embeddings”) for tasks including semantic search, retrieval-augmented generation (RAG) and classification. Supporting over 100 languages, Embed 4 has an extremely large context window of up to 128,000 tokens.

That length — equivalent to “about a 200-page document,” reports SiliconANGLE – allows it to “ingest a lengthy annual financial report, product manual or detailed legal contract.”

Once documents or other data are transformed into embeddings for RAG and other contextual retrieval use cases, AI agents can then find and reference that data to respond to user prompts.

“Embedding models help transform complex data — text, images, audio and video — into numerical representations that computers can understand. The embeddings capture the semantic meaning of the data, making them useful for tasks like search, recommendation systems, and natural language processing,” Computerworld explains.

Embed 4 has been designed to handle complex, “noisy” data — such as scanned documents or handwritten notes with spelling mistakes — without heavy preprocessing. Such formats “are common in legal paperwork, insurance invoices, and expense receipts,” Cohere says in a blog post, emphasizing Embed 4’s performance “eliminates the need for complex data preparations or pre-processing pipelines, saving businesses time and operational costs.”

According to Cohere, “Embed 4 excels in regulated industries such as finance, healthcare, and manufacturing,” with strong general business knowledge that is “optimized with domain-specific understanding of these industries so that it can identify relevant insights within common documents.”

VentureBeat writes Cohere believes that Embed 4 will be an ideal search engine for “agents and AI assistants across an enterprise” that can also “cut high storage costs.”

In addition to deploying on Cohere’s cloud, or enterprise servers like the Microsoft Azure AI Foundry or Amazon SageMaker, Embed 4 can also run on any virtual private cloud or on-premise tech stack, according to the startup.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.