Veo AI Image Generator and Imagen 3 Unveiled at Google I/O

By Paula Parisi
May 16, 2024

Google is launching two new AI models: the video generator Veo and Imagen 3, billed as the company’s “highest quality text-to-image model yet.” The products were introduced at Google I/O this week, where new demo recordings created using the Music AI Sandbox were also showcased. The 1080p Veo videos can be generated in “a wide range of cinematic and visual styles” and run “over a minute” in length, Google says. Veo is available in private preview in VideoFX by joining a waitlist. At a future date, the company plans to bring some Veo capabilities to YouTube Shorts and other products.

“Veo builds upon years of our generative video model work, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere — combining architecture, scaling laws and other novel techniques to improve quality and output resolution,” Google explained in a blog post that includes Veo samples, including clips from filmmaker Donald Glover and his creative studio Gilga.

Imagen 3 promises a better understanding of natural language and “the intent behind prompts” and incorporates small details from longer prompts, Google says, noting “an incredible level” of photorealistic detail and “far fewer distracting visual artifacts than our prior models.”

Veo’s “advanced understanding of natural language” allows the model to comprehend cinematic terms like “timelapse” and “aerial shots.”

“Users can direct their desired output using text, image, or video-based prompts, and Google says the resulting videos are ‘more consistent and coherent,’ depicting more realistic movement for people, animals, and objects throughout shots,” The Verge writes.

DeepMind CEO Demis Hassabis said in a press preview on Monday that “Google is exploring additional features to enable Veo to produce storyboards and longer scenes.”

Veo and Imagen 3 are described by Engadget as “a way for Google to keep up the fight against OpenAI’s Sora video model and DALL-E 3, a tool that has practically become synonymous with AI-generated images.” The Verge says Google’s Lumiere video generation model, showcased in January, “was one of the most impressive models we’d seen before Sora was announced in February.”

“Meanwhile, OpenAI is already pitching Sora to Hollywood and planning to release it to the public later this year,” The Verge reports, adding that the company is exploring incorporating audio into Sora “and may make the model available directly within video editing applications like Adobe’s Premiere Pro.”

“Google is also working with recording artists like Wyclef Jean and Bjorn to test out its Music AI Sandbox, a set of tools that can help with song and beat creation,” according to Engadget, suggesting the work resulted in “a few intriguing demos.”

Related:
Everything Google Announced at I/O 2024, Wired, 5/14/24
Google Takes the Next Step in Its AI Evolution, The New York Times, 5/14/24
Google Experiments with Using Video to Search, Thanks to Gemini AI, TechCrunch, 5/14/24
Google’s Generative AI Can Now Analyze Hours of Video, TechCrunch, 5/14/24
Google Expands Digital Watermarks to AI-Made Video and Text, Engadget, 5/14/24
Introducing VideoFX, Plus New Features for ImageFX and MusicFX, Google Blog, 5/14/24
Project IDX, Google’s Next-Gen IDE, Is Now in Open Beta, TechCrunch, 5/14/24
Google’s New Private Space Feature Is Like Incognito Mode for Android, TechCrunch, 5/15/24
Google Will Use Gemini to Detect Scams During Calls, TechCrunch, 5/14/24
Google’s Call-Scanning AI Could Dial Up Censorship by Default, TechCrunch 5/15/24

Veo AI Image Generator and Imagen 3 Unveiled at Google I/O

No Comments Yet

Leave a comment