By | November 28, 2022

Stable Diffusion 2.0 brings a whole load of new features to the table – including a brand new text encoder from OpenCLIP. Say goodbye to the old, closed OpenAI and hello to new and open ViT-H/14! Yes – it’s time to learn some new prompting 🙂

Are you having problems with prompting And stable diffusion 2.0 not getting Very good results well don’t worry You’ve got a nerdy rodent video to help You now I’m going to use the automatic 1111 webviewy and you can too because it Is free to download and install you’re Also going to need the version 2.0768 V stable diffusion model and its Associated configuration file all you Need to do is download them into your Stable diffusion models directory Ensuring that the yaml file and the Checkpoint both have the same name as Long as you’re using the very latest Automatic 1111 web interface that’s it You’re up and running and ready to go And that’s what I’m going to be using Right here okay the very first thing to Realize with stable diffusion 2 is that It uses a different clip model this Means that prompting is very very Different your old prompts which worked In stable diffusion 1.5 will not be the Same in stable diffusion too it’s a Completely different text model another Thing that won’t work are your old Embeddings as they are 5 12 by 512 you Can make new embeddings though 768×768 With a new model so if you want to use Embeddings you can still do that but you Will need to make them fresh one tool Which may help you on your version 2.0 Prompting journey is this clip

Interrogator tool you can just drag a Picture onto here click submit and then In a few seconds it will give you a Prompt that is based on the new clip Encoder the vit H model it’s important To realize that this new open clip model Is very different to the old open AI Model here we have some text let’s just Copy and paste this in to our new Automatic 1111 web UE we’ve got the 768v EMA checkpoint enabled up there We’re going to set the size to 768 by 768 and we’re going to generate that Example prompt just quickly fix these Brackets up here and then we’ll see what It generates As you can see it is indeed very Different from the image that I dragged And dropped in there to get the text Prompt but it does give you an example Of the sorts of words that you can use With the new version 2.0 with that in Mind let’s start with a very very simple Prompt this is exceptionally basic here We have a steampunk cat in a cyberpunk City one of my favorite initial prompts To test with here we’re using the Euler Sampler and 50 steps as you can see it Comes up with a fairly decent cat Guidance scale in version 2 does have Quite a large impact as you can see this Was done with a guidance scale of 5.75 If we double that to 10 for example then We get a very similar image but with a

Whole load of extra detail and as with The previous stable diffusion models the Higher you set the guidance scale the More sort of baked or cooked the image Will look as with stable diffusion 1.5 These steps will also have fairly large Impact on your image we run that same Prompt again but with half the number of Steps with 25 steps we still get a very Good looking cat but slightly different Details in there each of the Samplers Will also give you a very different Result so depending on what you’re going For you may need to pick a different Sample but I’ve found basically is that The DPM Samplers give you a very good Image with lots of texture whereas the Euler samples will give you an image That is a lot smoother perhaps slightly More similar to the version 1.5 images That you are used to let’s have a quick Look at what all these different Samplers look like here we have the DPM Plus plus 2m sampler as you can see that Is a very realistic image we’ve got lots Of nice textures on the skin some Fantastic eyebrows some lovely eyes and Great texture details on the clothing as Well I think that looks very good now What happens if we run exactly the same Thing but with the Euler sample here you Can see the image is very very similar But we’ve lost a lot of that detail on The skin everything seems to Look a Lot

Smoother so that may be a look that you Are going for if so then choose the Euler sampler and if you want to use the D-dim sampler that looks very very Similar indeed as you can see it only Changed a small amount there one thing Where stable diffusion 2 does seem to Excel compared to 1.5 is with that new Text encoder it seems to understand Concepts a lot better than the previous Version for example here I’m looking at The concept next two so I’ve got a cute Rabbit next to a large water jug Everything else is just irrelevant That’s basically just to create a nice Looking image now stable to Fusion 2 is Quite good at this it’s like okay yes I’m going to put that rabbit right next To that jug excellent now what happens If we change this model to the old 1.5 Pruned EMA checkpoints I’m going to drop It back down to 5 512 by 512 just to be Fair to this model and we’ll see how This does on exactly the same prompt as You can see there’s not even a jug in There we just have the rabbit so in Stable diffusion 1.5 some of these Concepts like next to are generally Speaking more of a suggestion whereas in 2.0 it will take you a lot more Literally And you’ll find the same also goes for Other Concepts such as on top of so here I’m generating a steampunk rodent on top

Of a birthday cake again the rest of the Text is mostly irrelevant I’m just Seeing if it will put the rodent on top Of the cake yes there it is a rodent on Top of a birthday cake it seems to have Understood that concept rather well We can of course also do even more Ridiculous things such as a fresh trout On top of a bicycle and once again as You can see here we have a fresh trout On top of that bicycle that seems to Understand these Concepts very well and Just testing another concept here so This is the concept of inside here I’m Going to generate a photo of a steampunk Rodent inside a glass jar and there it Is steampunk rodent inside a glass jar That looks fantastic to me if you try This thing in 1.5 as you can see it’s Similar we do have the rodent and we do Have the glass jar but the rodent is not Inside the glass jar it is instead on Top of it perhaps even slightly merged With it now what happens if we try and Make up our own Concepts so here I have A happy woman with three eyes now Obviously people don’t normally have Three eyes so what will stable diffusion 1.5 make of that okay we’ve got a very Happy woman it’s got the happy but Normal people have two eyes and so this Image has just two eyes as well if we Try the same thing with stable diffusion Version two

Then as you can see we get a very Similar image but it is sort of trying To do the three eyes it’s got two eyes But then they have these round circles Around them as well so perhaps it is Thinking maybe three eyes is glasses But how good is stable diffusion 2 at Styles let’s have a look at an Impressionist art style painting of the Dream I had last night and there it is An impressionist art style painting to Me that looks pretty good obviously here I haven’t used any artists I’ve simply Explained that I want the image to be in A particular style And here is an example of a graffiti art Style I’ve got a saloon car covered in Parsley and with coriander on it spray Painted onto a wall and that looks very Good to me it’s certainly a graffiti art Style now I’ve only been testing for a Couple of days but photorealistic images Is certainly something that’s stable Diffusion 2 seems to shine at here in This example I’ve got a stunning Portrait of a dream creature now it is a Painting but I’ve also added lots of Text on there such as realism and Photography 200 millimeter things like That and in my negative prompts which I’ll dive into in a minute I also have Things which go against drawing so Things like sketches illustration stuff Like that and I get barely photo

Realistic image of what looks like a bit Of a stuffed toy there If we have a quick look at photorealism In faces here I’ve got a close-up Portrait of a lovely young lady from Wales as you can see I think that is a Very very photorealistic portrait stable Diffusion 2 still has plenty of styles In there all they have removed from the Data set is those not suitable for work Things but that does mean that we can Still mix various types together so here I’m mixing a painting of a woman and Also photorealism as well and let’s have A look at that as you can see we get a Lot of nice detail here it does look Real but it also looks like a painting That hair looks rather fantastic I think All that looks very good mixing the two Styles together of a painting and Realism if science fiction is your thing Like it is mine then don’t worry you can Still create plenty of Science Fiction Images so here I have a cinematic film Still of a cyborg as you can see your Prompts do not have to be complex at all But it does help to have something in The negative prompt there I have blocks JPEG and some dusty particles there is My cyborg very smooth I think very nice Looking Cyborg and there is still some Celebrity information in there as well So for example if I add Will Smith to The negative prompt there so I don’t

Want humans I’m hoping to get something A little bit more robotic then maybe if I take humans out will I get an Android Yes I will Okay so we’ve had a lot of Tests there we’ve seen paintings we’ve Seen impressionism we’ve seen realistic Things how about another Style just to Finish this section off we’re gonna have A cartoon style graphic novel Illustration of a cyborg because I still Like science fiction and there we have a Very cartoony sort of illustration Effect image Now the astute amongst you might have Noticed that I’ve been using some rather Interesting negative prompts let’s start Here with a close-up portrait of an elf Bard RPG Avatar face fantasy art photo Style okay so that’s a very nice image But I want it to change a little bit so I’m going to put some more information Into the prompts so here I have a Close-up portrait it’s still the same Thing but I’m putting a forest in the Background I’m using a dark color Palette and I’m also making it a little Bit more realistic as you can see in the Negative prompts there I’ve put things In like 3D render illustration cartoon Sketch things like that so all of those Are not photos I want a photo realistic One so there we go we’ve got our elf Bard in the forest looks pretty good but How can we change this even more well

Say I wanted to change this to more of a 3D render style then we’ll pop this in So we’ll keep the close-up portrait of a Bard but I’ve taken out render from the Negative prompt and I’ve put render back Into the positive prompt so now when we Render this we’ll have a very different Image to the photorealistic one there it Is so it’s still photo realistic but the Image has changed quite considerably Perhaps I want this as an oil painting Instead so again we’ll use much the same Prompt but I’m adding oil painting in There with just that one oil on canvas Word again we’ve got a very different Feel but a very similar image and just Switching that oil on canvas up to hyper Realism again I get a very similar image But with a more realistic looking face And what happens if I put all of this Together into one massive set of prompts So here I’ve got a photo realistic Close-up I want a fantasy art digital Render what a dark color palette with a Forest in the background I’ve got some Pretty ears and I’ve got all this hyper Realism so I’m mixing all sorts of Things up and I’ve also got this Absolutely massive negative prompt in There as well let’s just show you what It looks like without the negative Prompt to start with I’ve got a very Very different image there it’s similar It is similar we’ve still got these

Horns and it’s still a druid and it’s Still a forest but it’s very very Different so what you can do throw in a Load of negative prompts now as you can See from these it almost doesn’t matter What you put put in your negative prompt Right so for example I’ve got some Stir-fry noodles in there wet pasta in a Bag some clay Sandwiches an opera on Wheels things like that really really Weird I will get a lovely photo Realistic image and there I think that One looks rather good Now if I try that exact same prompt but With a 512 model the old SD 1.5 then the Image is not quite as good which leads Me on to some other things about stable To Fusion 2 as well as you can see from This 1.5 image one thing you probably Haven’t noticed so much in stable Diffusion 2 is all those extra ears and Eyes now as you can see this elf has Some very very pointy ears and often They will have four or more ears such as If you try to generate a rabbit or a Mouse or anything like that typically it Won’t have the correct number of ears Whereas in stable diffusion two it does It does seem to have the correct number Of ears our hands better as well a Little bit hands are a little bit better In stable diffusion too if for example We’d run this prompt a photo of a Perfectly normal human hand instead with

A fusion 1.5 there we have what you are Used to perfect picture of some Excellent and perfectly normal hands Whereas here in stable diffusion 2 okay It's not quite as mutated and it does Have quite a lot of texture detail see Here we've got the very hairy arm we've Got an actual proper thumb there fingers I think if you were looking at that hand At a particular angle it could maybe Look like that it's close but it is a Little bit better at hand