Stable Diffusion Face Animations
Stable Diffusion is an algorithm that can create face animations from text. This algorithm works best with a set of known faces. The problem with random faces is that Stable Diffusion gets some details right but not others. The best faces to use are people with famous names. Another approach to this problem is to use text embedding transformation. The transformation can create text embeddings for only some of the faces and interpolate them.
Text embedding transformation
Stable diffusion produces images that are 512 pixels in size. You have to make sure that the height and width arguments are multiples of eight to get the best results. If you go below this number, you may end up with a poor quality image, while going over the limit in either direction will result in repeating image regions. In most cases, the best option is to use 512 in one dimension.
StableDiffusion is available as a free Windows executable. Moreover, the software no longer requires authentication tokens. While the program’s practical value is in its weights, the developers are already planning to add additional features such as a new Img2Img feature. This will allow a user to provide visual prompts from sketches and photographs.
The Stable Diffusion method can be applied to video as well. The algorithm uses the data provided by the user to determine the position and the motion of a face. The data used is 5.4 million videos – not NSFW – and includes human activities. However, the input is likely to contain some NSFW loops or animated GIFs.
The Stable Diffusion model is inspired by Imagen, but uses latent diffusion to reduce memory requirements. It utilizes an autoencoder with a reduction factor of eight, and it operates in low-dimensional space. Compared to pixel-space diffusion models, the Stable Diffusion algorithm requires 64 times less memory.