Google Takes New Approach to Create Video with Lumiere AI

Google has come up with a new approach to high resolution AI video generation with Lumiere. While most GenAI video models output individual high resolution frames at various points in the sequence (called “distant keyframes”), fill in the missing frames with low-res images to create motion (known as “temporal super-resolution,” or TSR), then up-res that connective tissue (“spatial super-resolution,” or SSR) of non-overlapping frames, Lumiere takes what Google calls a “Space-Time U-Net architecture,” which processes all frames at once, “without a cascade of TSR models, allowing us to learn globally coherent motion.”

To achieve its high resolution result, Lumiere applies an SSR model on overlapping windows, utilizing MultiDiffusion “to combine the predictions into a coherent result,” Google researchers explain in a scientific paper.

“By handling both the placement of objects and their movement simultaneously,” Google claims Lumiere “can create consistent, smooth and realistic movement across a full video clip,”reports Tom’s Guide.

For now, Lumiere’s Space-Time U-Net architecture, abbreviated as STUNet, can generate 80 frames at 16fps — or 5 seconds of video — putting it on par with its competitors. But the paper describes “a new inflation scheme which includes learning to downsample the video in both space and time,” that Google says can pave the way to longer (suggesting even “full-length”) clips.

GenAI video being a brass ring as far as the entertainment industry is concerned; it’s no small matter that Lumiere — created by Google in conjunction with the Weizmann Institute of Science and Tel Aviv University  — is being hailed as a breakthrough.

“An AI video generator that looks to be one of the most advanced text-to-video models yet,” says PetaPixel. “Pretty amazing,” assesses ZDNet. “Revolutionary,” opines TechTimes. “Can render cute animals in implausible situations,” writes Ars Technica, guaranteeing social media must have as soon as it is opened to the public.

For now, although the scientific paper compares Lumiere to competing generative video technologies including Stable Diffusion, Imagen, Runway and Pika, it is not available for independent testing, and there is no word as to Google’s timeline for potential deployment.

At its core, Lumiere “is a video diffusion model that provides users with the ability to generate realistic and stylized videos” along with “options to edit them on command,” writes VentureBeat. Google says the model can generate text-to-video or image-to-video.

“The model also supports additional features such as inpainting, which inserts specific objects to edit videos with text prompts; Cinemagraph to add motion to specific parts of a scene; and stylized generation to take reference style from one image and generate videos using that,” VentureBeat adds.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.