Stability AI Releases Stable Diffusion Text-to-Image Generator

Stability AI is in the first stage of release of Stable Diffusion, a text-to-image generator similar in functionality to OpenAI’s DALL-E 2, with one important distinction: this open-source newcomer lacks the filters that prevent the earlier system from creating images of public figures or content deemed excessively toxic. Last week the Stable Diffusion code was made available to just over a thousand researchers and the Los Altos-based startup anticipates a public release in the coming weeks. The unfettered unleashing of a powerful imaging system has stirred controversy in the AI community, raising ethical questions.

“Even if the results aren’t perfectly convincing yet, making fake images of public figures opens a large can of worms,” is the take by TechCrunch. Stable Diffusion is designed to run on consumer GPUs with under 10GB of VRAM, generating images 512×512-pixels in size in seconds, based on any text prompt.

“Stable Diffusion will allow both researchers and soon the public to run this under a range of conditions, democratizing image generation,” said Stability AI founder and CEO Emad Mostaque in a blog post.

“We are delighted to release the first in a series of benchmark open source Stable Diffusion models that will enable billions to be more creative, happy and communicative,” Mostaque added, noting that the program’s creators “look forward to the positive effect of this and similar models on society and science in the coming years.”

While the company has its supporters, TechCrunch warns that “making the raw components of the system freely available leaves the door open to bad actors who could train them on subjectively inappropriate content.”

While generally considered the brainchild of Mostaque — who has a Masters in mathematics from Oxford University and worked at hedge funds before segueing to AI — its inventor says it “builds upon the work” pioneered by DALL-E 2 and Google Brain’s Imagen as well as from “the team at CompVis and Runway in their widely used latent diffusion model.” Mostaque also credits Stability AI’s lead generative AI developer Katherine Crowson.

While Stable Diffusion draws on learnings from DALL-E 2, it can do something its predecessor cannot: create landscape images. Landscape photographer Aurel Manea tells PetaPixel he is blown away by Stable Diffusion’s environmental images. DALL-E 2 “ does ‘great images of people’s faces that look like photos,’ but, he says, DALL-E fails to create landscape photography.”

VentureBeat writes that the burgeoning AI text-to-image sector is raising legal as well as moral questions, namely who owns images generated by products like Stable Diffusion or DALL-E 2? The person delivering the prompts or the owner of the AI that trains the model?

No Comments Yet

You can be the first to comment!

Sorry, comments for this entry are closed at this time.