Google’s Prompt to Prompt AI for Stable Diffusion Tutorial
Stable Diffusion is a class of generative machine learning models. It uses a text-encoder that’s already trained and a negative seed and cfg scale value. This allows it to generate images with seamless texture. It has also been used to generate symmetrical images and abstract ones.
Stable Diffusion is based on a class of generative machine-learning models called Latent Diffusion
The diffusion model maps latent variables onto image data. It also adds noise to the image data. The model assumes the same dimensionality as the X0 input. After training, the model learns to reverse this process. To make this happen, it passes random noise through a learned denoising process.
Stable Diffusion uses hyperscale datasets to inform its training process. These datasets include hundreds of thousands of celebrity images, as well as more general material used to populate environments and provide full-body deepfake content. It can be trained to generate images of celebrities who are underrepresented in the media. It also enables users to incorporate prompt adjuncts to produce images of younger-looking celebrities.
It uses CLIP’s already trained text-encoder
In this tutorial, you will learn about using CLIP’s already trained text-encoding model for Stable Diffusion. The Stable Diffusion model uses a latent seed as an input and a text prompt as an output. The latent seed generates random latent image representations whereas the text prompt is transformed into text embeddings by CLIP’s text-encoder. Then, the model runs a ‘denoising process’ 50 times to find better latent image representations. Finally, the variational auto encoder decodes the latent image representation.
The training process is done separately for the image and text-encoder models. For each input text, the CLIP text encoder generates an embedding based on the model’s special prior. The diffusion decoder then re-encodes the image embedding to form an image. The resulting images are then inverted back into images.
It uses a negative cfg scale value
If you are attempting to use Stable Diffusion with your image, you should be aware of the various options available for it. These options allow you to tweak the treatment of the seed and prompt pair. Some of these are obvious, such as image height and width, and others are more subtle.
The guidance_scale argument controls the level of adherence of a model to the prompt. For example, if you want to train your AI to recognize a particular image, use a guidance_scale value of -10, and it will output an image of the opposite side of the latent space. A value of -1 would result in a random image. In contrast, a value of 20 would yield the image closest to the prompt.