Stable Video 4D Adds Time Dimension to Generative Imagery

Stability AI has unveiled an experimental new model, Stable Video 4D, which generates photorealistic 3D video. Building on what it created with Stable Video Diffusion, released in November, this latest model can take moving image data of an object and iterate it from multiple angles — generating up to eight different perspectives. Stable Video 4D can generate five frames across eight views in about 40 seconds using a single inference, according to the company, which says the model has “future applications in game development, video editing, and virtual reality.” Users begin by uploading a single video and specifying desired 3D camera poses.

“Stable Video 4D then generates eight novel-view videos following the specified camera views, providing a comprehensive, multi-angle perspective of the subject,” Stability explains in an announcement. “The generated videos can then be used to efficiently optimize a dynamic 3D representation of the subject in the video,” a process that takes 20 to 25 minutes.

Available on Hugging Face, Stable Video 4D is different from well-known generative video models like OpenAI’s Sora, Luma, Midjourney and Runway in that Stability designed it for “use cases where there is a need to view dynamically moving 3D objects from arbitrary camera angles,” including movie production, gaming and AR/VR, Stability AI 3D team lead Varun Jampani told VentureBeat.

Stability AI is perhaps best known for its first product, the image-generator Stable Diffusion, launched in August 2022. It later branched into generative video.

In March, Stable Video 3D was announced, enabling users to generate short 3D video from an image or text prompt. Stable Video 4D goes “a significant step further,” VentureBeat says, explaining that “the four dimensions include width (x), height (y), depth (z) and time (t),” which means Stable Video 4D “is able to view a moving 3D object from various camera angles as well as at different timestamps.”

With Stable Video 4D professionals “can significantly benefit from the ability to visualize objects from multiple perspectives, enhancing the realism and immersion of their products,” ZDNet writes, quoting the company, which issued a technical report on the development of the new model.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.