CES: Nvidia’s Cosmos Models Teach AI About Physical World

Nvidia Cosmos, a platform of generative world foundation models (WFMs) and related tools to advance the development of physical AI systems like autonomous vehicles and robots, was introduced at CES 2025. Cosmos WFMs are designed to provide developers a way to generate massive amounts of photo-real, physics-based synthetic data to train and evaluate their existing models. The goal is to reduce costs by streamlining real-world testing with a ready data pipeline. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos integrates Nvidia Omniverse, a physics simulation tool used for entertainment world-building. Continue reading CES: Nvidia’s Cosmos Models Teach AI About Physical World

Twelve Labs Creating AI That Can Search and Analyze Video

Twelve Labs has raised $30 million in funding for its efforts to train video-analyzing models. The San Francisco-based company has received strategic investments from notable enterprise infrastructure providers Databricks and SK Telecom as well as Snowflake Ventures and HubSpot Ventures. Twelve Labs targets customers using video across a variety of fields including media and entertainment, professional sports leagues, content creators and business users. The funding coincides with the release of Twelve Labs’ new video foundation model, Marengo 2.7, which applies a multi-vector approach to video understanding. Continue reading Twelve Labs Creating AI That Can Search and Analyze Video

Grok-2 Chatbot Is Now Available Free to All Users of X Social

Elon Musk’s xAI has been rolling out an updated Grok-2 model that is now available free to all users of the X social platform. Prior to last week, the “unfiltered” chatbot — which debuted in November 2023 — was available only by paid subscription. Now Grok is coming to X’s masses, but those on the free tier can only ask the chatbot 10 questions in two hours, while Premium and Premium+ users will “get higher usage limits and will be the first to access any new capabilities.” There is also now a Grok button featured on X that aims to encourage exploration. Continue reading Grok-2 Chatbot Is Now Available Free to All Users of X Social

Amazon Dives into Generative AI with Nova Foundation Models

After years of focusing on AI infrastructure, Amazon is plunging into the frontier model business with the Nova series. The new family of generative AI models includes the text-to-text model Amazon Nova Micro and Amazon Nova Lite for fast, mobile-friendly apps, and at the upper echelon the multimodal Amazon Nova Pro and Amazon Nova Premier for processing text, images and video. Amazon, which is heavy into production via Amazon Studios and MGM, is also launched two specialty models focused on “studio quality” output — Amazon Nova Canvas for images and Amazon Nova Reel for video. Continue reading Amazon Dives into Generative AI with Nova Foundation Models

Luma AI Upgrades Its Video Generator and Adds Image Model

Anticipating what one outlet calls “the likely imminent release of OpenAI’s Sora,” generative AI video competitors are compelled to step up their game. Luma AI has released a major upgrade to its Dream Machine, speeding its already quick video generation and enabling a chat function for natural language prompts, so you can talk to it as with OpenAI’s ChatGPT. In addition to the new interface, Dream Machine is going mobile and adding a new foundation image model, Luma AI Photon, which “has been purpose built to advance the power and capabilities of Dream Machine,” according to the company. Continue reading Luma AI Upgrades Its Video Generator and Adds Image Model

Anthropic Protocol Intends to Standardize AI Data Integration

Anthropic is releasing what it hopes will be a new standard in data integration for AI. Called the Model Context Protocol (MCP), its goal is to eliminate the need to customize each integration by having code written each time a company’s data is connected to a model. The open-source MCP tool could become a universal way to link data sources to AI. The aim is to have models querying databases directly. MCP is “a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments,” according to Anthropic. Continue reading Anthropic Protocol Intends to Standardize AI Data Integration

Particle Launches AI News App That Summarizes in Quick Hits

Particle, the AI-powered news aggregator created by a pair of Twitter alums, has launched after a year in beta. The iOS app summarizes current events in quick hits the startup says do not violate the copyrights of publishers whose news it shares. Instead of simply scraping publishers’ work for proprietary use, the startup seeks to compensate publishers and drive traffic to news sites with prominent links to sources accompanying each AI news summary. Developed by Sara Beykpour and Marcel Molina, Particle has raised more than $11 million in early funding led by Lightspeed. Continue reading Particle Launches AI News App That Summarizes in Quick Hits

Baidu’s Ernie AI Gets Improved Text-to-Image and App Builder

Ernie, the foundation model for Baidu’s generative AI, has been updated with iRAG technology to mitigate visual hallucinations and a no-code tool called Miaoda that creates apps using natural language. The company behind China’s largest search engine says Ernie now handles 1.5 billion daily user queries, up from 50 million circa its March 2023 launch (a 30x increase). Baidu also debuted Ernie-powered smart glasses from its Xiaodu Technology hardware unit. The Xiaodu AI Glasses features built-in voice activation and cameras for taking photos and video. The news was shared at this week’s Baidu World 2024 in Shanghai. Continue reading Baidu’s Ernie AI Gets Improved Text-to-Image and App Builder

Anthropic’s AI Agents for Claude Sonnet Increase Productivity

In its first week of public beta, Anthropic’s “Computer Use” feature is gaining immediate traction, helping people do research and complete coding tasks. Claude works autonomously in Computer Use mode, suggesting broad implications for future productivity and workforce goals. Coming on the heels of OpenAI’s Swarm framework, these early forays into independent AI assistants seem to indicate that implementing such systems will be an area of focus for businesses in 2025. Claude can “see” what’s onscreen and use its “judgment” to adapt to different tasks, segueing across workflows and software. Continue reading Anthropic’s AI Agents for Claude Sonnet Increase Productivity

OpenAI Showcases Latest Updates for Voice, Picture and More

OpenAI unveiled major updates at its DevDay conference with the focus largely on making AI more accessible, efficient and affordable. Included were four innovations: Vision Fine-Tuning in the API, Model Distillation, Prompt Caching and the public beta of Realtime API. The approach underscores OpenAI’s effort to empower its developer ecosystem even as it continues to compete for end-users in the enterprise space. The Realtime API gives developers the option of building “nearly real-time” speech-to-speech app experiences, selecting from among six OpenAI voices. Vision Fine-Tuning for GPT-4o enables customization of the model’s visual understanding of images and text. Continue reading OpenAI Showcases Latest Updates for Voice, Picture and More

Dolby to Expand Its Cloud-Based Live Streaming with THEO

Dolby Labs has introduced cloud-based solutions to support clients with real-time, interactive streaming capabilities. The announcement, made from IBC 2024 in Amsterdam, follows Dolby’s July acquisition of streaming tools provider THEO Technologies, which services top sports, media and entertainment companies worldwide. Dolby and THEO promise streaming that is “more interactive, personalized, and delivered with extremely low latency.” Dolby will also offer a new capability, THEOads, providing an advertising environment “that is optimized for the dynamic nature of live content.” Continue reading Dolby to Expand Its Cloud-Based Live Streaming with THEO

OpenAI Previews New LLMs Capable of Complex Reasoning

OpenAI is previewing a new series of AI models that can reason and correct complex coding mistakes, providing a more efficient solution for developers. Powered by OpenAI o1, the new models are “designed to spend more time thinking before they respond, much like a person would,” and as a result can “solve harder problems than previous models in science, coding, and math,” OpenAI claims, noting that “through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.” The first model in the series is being released in preview in OpenAI’s popular ChatGPT and in the company’s API. Continue reading OpenAI Previews New LLMs Capable of Complex Reasoning

Anthropic Publishes Claude Prompts, Sharing How AI ‘Thinks’

In a move toward increased transparency, San Francisco-based AI startup Anthropic has published the system prompts for three of its most recent large language models: Claude 3 Opus, Claude 3.5 Sonnet and Claude 3 Haiku. The information is now available on the web and in the Claude iOS and Android apps. The prompts are instruction sets that reveal what the models can and cannot do. Anthropic says it will regularly update the information, emphasizing that evolving system prompts do not affect the API. Examples of Claude’s prompts include “Claude cannot open URLs, links, or videos” and, when dealing with images, “avoid identifying or naming any humans.” Continue reading Anthropic Publishes Claude Prompts, Sharing How AI ‘Thinks’

OpenAI Pushes GPT-4o Customization with Free Token Offer

OpenAI announced its newest model, GPT-4o, can now be customized. The company said that the ability to fine-tune the multimodal GPT-4o has been “one of the most requested features from developers.” Customization can move the model toward more specific structure and tone of responses or allow it to follow specific instruction sets geared toward individual use cases. Developers can now implement custom datasets, aiming for better performance at a lower cost. The ChatGPT maker is rolling out the welcome mat by offering 1 million training tokens per day “for free for every organization” through September 23. Continue reading OpenAI Pushes GPT-4o Customization with Free Token Offer

D-ID Employs AI to Translate Videos into Multiple Languages

D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation. Continue reading D-ID Employs AI to Translate Videos into Multiple Languages