Stability AI Intros Real-Time Text-to-Image Generation Model

By Paul Bennun
December 4, 2023

Stability AI, developer of Stable Diffusion (one of the leading visual content generators, alongside Midjourney and DALL-E), has introduced SDXL Turbo — a new AI model that demonstrates more of the latent possibilities of the common diffusion generation approach: images that update in real time as the user’s prompt updates. This feature was always a possibility even with previous diffusion models given text and images are comprehended differently across linear time, but increased efficiency of generation algorithms and the steady accretion of GPUs and TPUs in a developer’s data center makes the experience more magical.

Write “A cat…” and see a cat. Add “… in a top hat,” and see the hat materialize. Finish the sentence “A catastrophic plane crash” instead and see the image change to that, too.

In the case of SDXL Turbo, in four steps the model can produce images that humans evaluate as matching a prompt better than images that the OpenMUSE model took 26 steps to produce.

About 70 percent of humans rate a 1-step image produced by SDXL Turbo as subjectively better than a 16-step image produced by OpenMUSE, and their data shows similar advantages over their own prior models. The advantage here is clearly time and energy savings (Stability AI claims the model can produce a 512×512 image in just 207ms on an A100 GPU).

According to Stability AI’s CEO Emad Mostaque, there’s a penalty from the approach, but one with a significant upside: “Less diversity, but way faster & more variants to come which will be… interesting, particularly with upscales & more,” he posted on X. Hardcore prompt warriors may not hate this drop in diversity, given the appreach clearly enables a category of application and workflow previously impossible.

The new model introduces what Stability AI calls ‘distillation’ techniques. The company describes the approach as “Adversarial Distilled Diffusion” or ADD, noting it shares similarities with GANs (Generative Adversarial Networks), which run with fewer steps and typically encode less semantic information, and are not typically prompted in natural language like our current generators.

Stability AI has published model weights and code on Hugging Face for non-commercial use, and link to a beta tool on Clipdrop so interested people can play with the model themselves.

Topics: ADD, Artificial Intelligence, Beta, Clipdrop, Code, DALL-E, Data Center, Developer, Emad Mostaque, GAN, GenAI, Generative AI, GPU, Hugging Face, Image, Midjourney, OpenAI, OpenMUSE, Real Time, SDXL Turbo, Stability AI, Stable Diffusion, Text-to-Image, TPU, Video

Stability AI Intros Real-Time Text-to-Image Generation Model

No Comments Yet

Leave a comment