Google has released a research paper on a new text-to-image generator called Imagen, which combines the power of large transformer language models for text with the capabilities of diffusion models in high-fidelity image generation. “Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis,” the company said. Simultaneously, Google is introducing DrawBench, a benchmark for text-to-image models it says was used to compare Imagen with other recent technologies including VQGAN+CLIP, latent diffusion models, and OpenAI’s DALL-E 2. Continue reading Google’s Imagen AI Model Makes Advances in Text-to-Image