Quote:
Originally Posted by BBB
Your description of the middle image looks to get good consistency, in terms of being an interpolation of the right and left images. It strikes me that there should be some advantage in describing the left part of an image first, because the AI training data will be dominated by English, a language read left to right (including in art).
As far as specific terms versus more general ones, I think the key is to avoid terms without multiple meanings. For instance, I wanted a character with a "braided leather belt" recently, and it inevitably wanted to give her braided hair. It tends to grab the first definition or most common use of a word and try to depict that. So "Caucasian woman" will give you a light European skin tone and features, while "White woman" may just put something white somewhere in the scene.
The generic 'how to make a good image' terms are wild to me. You literally just tell it "it should look really good" and "don't make it look bad" and it yields better results. Just imagine if telling a human artist to "do better!" was all it took for improvement!  They're important, but at the same time it's something you get to a good place once, and then once you're getting good images in general you can focus on getting the subject and setting you want.
|
I think the reason why the quality prompting works is because the database of images it's got to work with is rather massive. Inevitably, some of the images are low quality, sub par work, unfitting styles, ect. with a quality prompt it tries to pick images of a higher quality in it's pool of images. I think it also tries to pick images that are in a consistent style. Which is also why you get wildly different styles when doing quality prompts. Without the prompts it just generates based on everything it's got. But because the styles are all over the place you ultimately end up with a worse quality image.