Google Shows Off Impressive Range of AI at NY Media Event
November 4, 2022
Google Research is touting new advances in artificial intelligence, which can now generate its own code and write fiction, in addition to better text-to-video and language translation. At a New York media event at Google’s Pier 57 office — which opened earlier this year to become the company’s third Manhattan outpost — roughly a dozen projects in various stages of development were on display, with robot learning, LaMDA (language model for dialogue applications) and text-generated 3D images sharing the spotlight with practical AI for things like disaster management, weather forecasts and healthcare.
“We see so much opportunity ahead and are committed to making sure the technology is built in service of helping people,” Google CEO Sundar Pichai told attendees via video, reports Axios, which details AI categories ranging from wildfire tracking and flood forecasting to AI ultrasounds and vision care, as well as creative applications.
Google announced the “1,000 Languages Initiative” to foster universal communication, “a many years undertaking — some may even call it a moonshot — but we are already making meaningful strides,” Google Research senior fellow and SVP Jeff Dean says in a blog post. “Our most advanced language models are multimodal, meaning they’re capable of unlocking information across these many different formats,” including images, videos and speech, Dean writes.
“Google shared the first rendering of a video that shares both of the company’s complementary text-to-video research approaches — Imagen Video and Phenaki,” reports VentureBeat, explaining “the result combines Phenaki‘s ability to generate video with a sequence of text prompts with Imagen’s high-resolution detail.
In what he describes as the premiere demo of “AI-generated super-resolution video,” Dean shares on his blog post a diffusion model that translates text prompts to “long, coherent videos.”
Google will soon bring text-to-image generation to the “AI Test Kitchen” app, Season 2, where people will be able to play with it, building themed cities with “City Dreamer” and designing “friendly monster characters that can move, dance, and jump with ‘Wobble’” Dean says.
In addition to 2D images, Google says text-to-3D is now a reality with DreamFusion, which produces a three-dimensional model that can be viewed from any angle and can be composited into any 3D environment (similar to Nvidia’s GET3D, which starts with 2D images, then allows detailing with text prompts using StyleGAN-NADA).
A big attention-getter was a Code as Policies demo letting robots “think” for themselves, using their machine learning to continually generate new code. “In a demonstration, Google’s Andy Zeng told a robot hovering over three plastic bowls (red, blue and green) and three pieces of candy (Skittles, M&M’s and Reese’s) that I liked M&M’s and that my bowl was blue. The robot placed the correct candy in the right bowl, even though it wasn’t directly told to ‘place M&M’s in the blue bowl,’” Axios writes.
At the live event, Google Research’s Douglas Eck demoed the Wordcraft Writers Workshop, which powers storytelling using the LaMDA narrative engine. Eck said Google is preparing a research paper on this application, which he described as more of a creativity booster than full-on story builder.
“It’s more useful to use LaMDA to add spice,” Eck said, describing it as a “text editor with a purpose,” according to VentureBeat, which said he also showcased something called AudioLM, which can extend musical snippets, potentially into a score.
“AI is the most profound technology we are working on, yet these are still early days,” Pichai said, according to Axios, which concludes that “despite recent financial headwinds, AI is steamrolling forward — with companies such as Google positioned to serve as moral arbiters and standard-setters.”
Related:
From Generative Images and Video to Responsible AI: Here’s What Google Covered During Its AI Event, Gizmodo, 11/2/22
Google Plans Giant AI Language Model Supporting World’s 1,000 Most Spoken Languages, The Verge, 11/2/22
Google’s Text-to-Image AI Model Imagen Is Getting Its First (Very Limited) Public Outing, The Verge, 11/2/22
Google Is Integrating Lens Directly into Its Search Box, TechCrunch, 11/2/22
No Comments Yet
You can be the first to comment!
Sorry, comments for this entry are closed at this time.