CES: Nvidia’s Cosmos Models Teach AI About Physical World

Nvidia Cosmos, a platform of generative world foundation models (WFMs) and related tools to advance the development of physical AI systems like autonomous vehicles and robots, was introduced at CES 2025. Cosmos WFMs are designed to provide developers a way to generate massive amounts of photo-real, physics-based synthetic data to train and evaluate their existing models. The goal is to reduce costs by streamlining real-world testing with a ready data pipeline. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos integrates Nvidia Omniverse, a physics simulation tool used for entertainment world-building.

While purpose-built to generate a physics-aware framework — including instructional videos — for  training robotic and AV intelligence, Nvidia Cosmos models could in theory also be used to train synthetic characters for entertainment use.

“Cosmos taps generative AI to fill the biggest gap that’s keeping robots from becoming more useful: training data,” writes ZDNet, which awarded Cosmos two Best of CES 2025 honors.

“The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own,” Nvidia founder and CEO Jensen Huang said during his CES 2025 keynote and in a news announcement. “We created Cosmos to democratize physical AI and put general robotics in reach of every developer.”

“Nvidia Cosmos’ suite of open models means developers can customize the WFMs with datasets, such as video recordings of AV trips or robots navigating a warehouse, according to the needs of their target application,” VentureBeat reports, quoting Huang saying Cosmos was “trained on 20 million hours of video. It’s about teaching the AI to understand the physical world.”

Also at CES, Nvidia introduced the Llama Nemotron family of open LLMs, built using Meta Platforms’ Llama AI. “With new Nvidia Cosmos Nemotron vision language models (VLMs) and Nvidia NIM microservices for video search and summarization, developers can build agents that analyze and respond to images and video from autonomous machines, hospitals, stores and warehouses, as well as sports events, movies and news,” Nvidia explained in a separate announcement.

Nvidia is making Cosmos models available under an open source model license to accelerate the work of the robotics and autonomous vehicle communities. Developers can preview the first models on the Nvidia API catalog, or download the family of models and fine-tuning framework from the Nvidia GPU Cloud catalog or Hugging Face.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.