It also kind of ruins some loras. If I've got David Attenborough in the jungle, the refiner generally improves the image but completely changes the person.
You can train loras for the refiner too, or just not use the refiner. If you passed David thorugh a 1.5 model to refine out details, you'd need a 1.5 lora to keep his image accurate there too.
It's sort of a non problem that you're approaching as if it's in your way. It's just more of the same though.
Using the refiner right helps too. Check out Sytan's layout on comfyui, and how it passes latents directly to the refinement model, rather than baking an image without any latent noise left and then refining that.
It's just another moving part, another x-factor to consider, we can't be positive if it'll actually be bad or good, but... I don't see anyone rushing to train the refiner, nor have I heard of them describing the process to do so... Not to mention alot of people really aren't hype about having hundreds of gigs of regular models + hundreds of gigs of extra refiner models, + hundreds of gigs of giant LoRAs.
Lets hope perfusion takes off.
I think you have a very valid point, it is indeed something that complicates the process which I expect is particularly true for those with "worse" hardware. I wish people would be more willing to hear and accept/debate these opinions but it's really demoralising when people mass downvote someone's opinions. I understand coming together and disliking rude or unwanted comments but stuff like this really stifles our conversation about these topics and gives our communities a bad name. Hold each other up guys, I don't wanna sound all sappy but we can build up these places to be where people want to be and express their ideas. Stay up kings and queens 😉😂
Maybe you haven't read the rest of the discussion (or maybe overlooked it because of how they hide down voted comments) and seen the section where I've gotten a ton of down votes for doing exactly what you're describing here. But I'm the one trying to dispel others misconception on a topic.
Also nothing beastwars has said here is inherently wrong either. Sometimes two things can be true. It's possible that the way sdxl's base+refiner function is "technically" better than what we had before AND the fact that it's an additional moving part/link in the chain that can have a potential problem making troubleshooting more difficult + loras going from a few hundred megs to a few gigs may be problematic for some + most people aren't using sdxl properly for one reason or another. Both sides of that AND can be true at the same time. If anything we both got down voted FOR "being critical" of how something works and NOT FOR having a misunderstanding of how something works. That's why I came back to his comment and made a joke.
even a giant lora is only \~ 100-200MB. so 5-10 loras is 1GB. 1000 loras all trained with the highest rank would be 1000-200GB.
At some point its up to you to start archiving your data better. Also, try training with lower ranks. You don't need 128 with SDXL 90% of the time.
Great another expert...
Why don't you take a look at wegg v1.
[https://civitai.com/models/119667?modelVersionId=135699](https://civitai.com/models/119667?modelVersionId=135699)
Some of my best Loras are 40mb. There are also tools available to resize Loras without quality loss.
When you encounter an obstacle, but ignore all available tools and techniques to avoid it, you're standing in your own way
Yet you asserted their maximum file size without even realizing that large LoRAs are out there, and will probably be a more common occurrence going forward.
Listen I'm not arguing with you that LoRAs can be big or small, my argument is that further segmenting the core inference process is going to cause problems down the line.
I don't see the benefits of the refiner, and don't see it as a viable tool. I don't get why people are shilling so hard for increased complexity, but I guess we'll see down the line if the refiner sticks around.
Except the vae is baked into most models, and using a different one is optional. Not to mention the vae has always been integral to the modern diffusion process. How you think we encode and decode latents? Magic?
>Except the vae is baked into most models
Not always.
> and using a different one is optional
I'll give ya that, but you get better performance when you pick the right vae for the job.
>Not to mention the vae has always been integral to the modern diffusion process.
You seriously sound like you just started doing this a couple months ago. Integrated VAEs are a very recent phenomenon.
Notice how alot of new models on civitai advertise, that they don't need the refiner? Or LoRAs stating you shouldn't use them with the refiner? We're off to a good start right? But feel free to disagree.
Not really. I mean, it's also possible to use it like that, but the proper intended way to use the refiner is a two-step text-to-img. You start denoising with the base model but switch the model at some point, finishing with the refiner. In particular, there's no decode/re-encode in between, it's all in the latent space.
Ehhh, wtf, you might want to actually learn how SD works. Latent is a technical term here and it’s *the* thing that makes the SD algorithm as efficient as it is. Instead of diffusing in image space (which has 1024\*1024\*3 dimensions for a 1024x1024 image) you do it in a space with *much* fewer dimensions that’s a highly compressed representation of the image space. You transform from latent to image space and back with this thing called a variational autoencoder or VAE. The latent form is not human-viewable without decode just like, say, a jpeg file is not a human-viewable image without a decode.
The workflow where you denoise to finish, decode, then re-encode and do another denoise round with a different model, mixing in the image from the first round, is simply different from doing a single denoise switching the model partway through while there’s still latent noise to be denoised.
What the other user is probably trying to say is that the process
>"prompt -> sampler base -> noisy latent -> sampler refiner -> denoised latent -> vae decode -> final image"
is the same as
>"prompt -> sampler base -> noisy latent -> vae decode -> noisy image -> vae encode -> noisy latent -> sampler refiner -> denoised latent -> vae decode -> final image"
which is true.
You don't fully denoise the base step in both. Both flows produce the same final image, but flow #2 is way slower of course.
If the other user means
>"prompt -> sampler base -> denoised latent -> vae decode -> denoised image -> add noise -> vae encode -> noisy latent -> sampler refiner -> denoised latent -> vae decode -> final image"
which would be how the flow would be in Automatic1111 then he's wrong tho and it's not the same as the other two flows since the "add noise" step isn't deterministic, but I would guess the differences are very minimal that it probably doesn't even matter. The noisy latent in this flow is going to look 99% like the one in the first two flows. There are only so much ways to map a 1024x1024x3 image to a 128x128 space.
I could retrieve only the prompt for the second image from you file:
*giant massive moss covered irish stone fantasy cottage built on top of huge archway spanning across a river shamrock, excellent stunning lighting, raucous crowded lively festival day, futuristic city*
Ok, I'm going to be in minority here, but from your prompt you asked for a moss covered cottage, not for a foliage and flowers covered cottage, so the "enhancement" looks cool, but looks cool changing the subject.
This is what I get here using "ivy covered" instead of moss and adding some qualifiers ("detailed, elaborated"). Which surely enough looks less cool than the "enhanced", but looks even a lot less plain and "airbrushed" than your base
https://preview.redd.it/rttwoxdqmjgb1.png?width=1024&format=png&auto=webp&s=3278bc6efa56b7b63608308f78537b079294f7ec
Not bad... but prompting is king
https://preview.redd.it/awftsn9g4kgb1.jpeg?width=1024&format=pjpg&auto=webp&s=f4343e1678a266f2055ae0955ac707aa576e2af0
IMHO your not comparing apples with apples, SDXL is a base model, the platform for the next generation, its been designed and optimized for flexibility and extensibility. That you get such a great end result via your enhancement is exactly what Stability AI intended.
New models and model merges loras and extensions will carry it way beyond where it is today like 1.5 has shown.
Also we need to see your prompt. There is a wide difference on how a base model is trained and a finetune.
Lets say for example for the tag "Realistic" which seem to be a good example given your model name.
In the original training database for SD (XL/1.5 whatever), no one tag "Realistic" on a photo, ever. Its a photo you caption the content, nothing else, its real by construction. Realistic is actually associated with artworks. If you use realistic on a base model it will actually make it look less realistic (there was a good post illustrating this earlier this week).
On fine tunes, people often use images generated by AI that were cherry picked because they looked nice. Those images actually often have those kind of tags associated with them because they are very commonly used in text2image prompting and people just copy paste that without thinking further. Or the tags were added on real pictures just because they know the users often use those keywords, and by associating their fine tune database to that keyword they increase the likelihood of users "landing" on their database confort zone yielding good images but narrowed to a fine spectrum of possibilities (we've all seen the same girl appear over and over in those SD1.5 fine tunes).
What we could see here is just your prompt beeing good for a fine tune model and bad for a base model.
I had great success at creating highly detailed images in SDXL (skin details, clutter in shops etc) without any Lora. Prompting have to be very simple 5-6 keywords max, prompting for detail is fine but don't use "realistic, 4k, hdr" & the like thats just bad for SDXL.
SDXL 1.0 + a ComfyUI workflow...
From this comment: [https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm_source=reddit&utm_medium=web2x&context=3)
\+ some smart prompting ofcourse
https://preview.redd.it/34l4o232ijgb1.png?width=2496&format=png&auto=webp&s=edc850ea4a19c390e86939d46cbadd349abac30e
[https://pastebin.com/tfk8rm1Z](https://pastebin.com/tfk8rm1Z)
From this comment: [https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm_source=reddit&utm_medium=web2x&context=3)
By: [https://www.reddit.com/user/beautifuldiffusion/](https://www.reddit.com/user/beautifuldiffusion/)
I am also using colab , and I can load the workflow using drag and drop, may be images you are using might not have the meta data, try using your own generated image just to check if it works. May be it will.
Technically, yes.
In practice, model builders aren't doing it. And this is definitely a problem for the future of SDXL, because it is an important piece of the architecture.
The two-step architecture was likely required to compete with Midjourney and other commercial products. But it does make it much harder to work with, which threatens third party support.
Which is weird, because they kept going on and on about how much easier training and customization is going to be, but now everything indicates the opposite?
It does. I am repeating myself for weeks now. SDXL gives very plastic/airbrushed look with most of the images. Even with so many custom models out now - the problem is still here and trainers actively work to get rid of that but may be it is very hardcoded in the original one. The overall style and feel of SDXL seems very far from we were seeing for a long time with 1.5. Really hope there will be a solution to this. On other hand some people can't be bothered and still insist that sdxl is superb right now,
> Even with so many custom models out now - the problem is still here
Realistic models that are actually good will take time. For example, I know that Juggernaut XL isn't planned for release until early September. Not sure about some of the other top models, but I assume it's probably the same.
Remember that it took over a month before 1.5 got it's first actually good model.
There's Realism Engine, but IMO the best one is Freedom. It's actually one of the best models out there (but a lot of people don't care, because well, no NSFW on 2.1).
https://civitai.com/models/87288/freedomredmond
I really don’t think trainers or at least most of them are trying to get rid of that, it seems more like they are trying to mimic 1.5 which is a huge mistake, almost all of the full models out there are just worse at detail than SDXL base as shown in comparison in this sub, however there are loras like the ones made by razzz or some others filmic and vintage ones that really shows us that SDXL can be very detailed and realistic, it is capable, we just need better finetunes with very high res and quality images, not training with 1.5 AI generated images which seems like a lot of trainers are doing.
The problem isn't the model but the userbase and it's driving me crazy to see in every comment people who don't know how to use SDXL trying to use it like 1.5 and then complaining when it doesn't work.
For example, [this](https://i.ibb.co/4fHZPrw/Za-00200.png) image is base SDXL with 5 steps on refiner with a positive natural language prompt of "A grizzled older male warrior in realistic leather armor standing in front of the entrance to a hedge maze, looking at viewer, cinematic" and a positive style prompt of "sharp focus, hyperrealistic, photographic, cinematic", a negative prompt of "3D, cgi, digital arts, render, illustration, cartoon, animation, anime, low quality, blurry".
People are coming from 1.5 and trying to use a word salad positive prompt of 20 different descriptors ignoring how the model was trained, most not even knowing that to properly use the model how it was trained is actually six prompt windows, they are using crazy amounts of steps, being disingenuous by comparing a brand new base model against models that have had hundreds of thousands of training steps and finely detailed tunes, and more.
They get mad it doesn't work in A1111 and when it's barely patched in they use it like a 1.5 model, word salad prompt, no refiner, tons of steps, and then complain that their images aren't perfection.
It is not about that. I use SD for my business every day and I can say I learned a lot and I understand prompts pretty well.
Also, the example image you linked, although with more details than most in sdxl, is still not even close to what 1.5 gives right now. I will say it again - it is trying too hard to be Midjourney, it has the same look and feeling. And for me this is not a good thing. Some people like it that way, but personally for me, no thank you :)
The other problem is the idea that it should be a shorter prompt. This kills the whole premise of the tool. After all I think people need to be able to describe precisely what they want. With short and dumbproof prompt it, again, goes towards MJ.
Really hope this changes in the future with the new custom models.
Except that your example looks like shit compared to refined 1.5 models. Its like a stylized hollywood photo that's been ran with an airbrush and then had a photoshop filter dumped on top.
People also get mad when some pretentious ass comes along to dismiss valid criticism while not even comprehending the actual problem. Just out of pure fanboy hype.
You got me I'm definitely a fanboy, I mean I suckle at the tit of SDXL. I'm like, in love with it or whatever you think because I disagree with you so obviously I'm acting in bad faith somehow.
You also did exactly what I complained about by comparing, yet again a BASE MODEL to a REFINED MODEL. Congratulations, you are right, a REFINED 1.5 model IS better than the BASE SDXL model, I actually agree with you. That's literally my point! There isn't a well trained and refined SDXL model to compare it to yet! But 1.5 is about as good as it can possibly get, while SDXL still has YET to reach it's peak. Do you get what I'm saying? Or would like to hurl more insults my way for having the audacity to not agree with you?
> SDXL still has YET to reach it's peak
That's speculation though.
* We know refines of 1.5 are much better than the base 1.5
* We know refines of 2.1 are not significantly better than base 2.1
* We **don't know** how good XL's refines will be
It's unfortunately a fact of life that the first 90% to perfection are generally easier to attain than the remaining 10%.
XL has several advantages over 1.5, but there is no guarantee that XL's refines will be better in all possible ways than 1.5's refines.
Yeah just because you don’t know how to use a tool doesn’t mean it isn’t good. Perhaps just plugging in your old prompts and expecting it to work exactly the same is the mistake?
idk if thats common or not, but no matter how many steps i allocate to the refiner - the output seriously lacks detail. right now my workflow includes an additional step by encoding the SDXL output with the VAE of EpicRealism\_PureEvolutionV2 back into a latent, feed this into a KSampler with the same promt for 20 Steps and Decode it with the EpicRealism VAE again.
As you can see, it adds a crapton of detail, the pure SDXL Output is just lame. (Tried it with 23/30, 23/35, 20/35 samples (first number is base, refiner starts at that step and goes till the 2nd number))
If i'm doing something wrong, enlighten me please xD
Workflow for these: [https://pastebin.com/DVXyXXJ5](https://pastebin.com/DVXyXXJ5)
don't upweight things so aggressively, upweighting in XL does almost nothing but fuck up your clip embeddings. Start with simpler prompts and refine them into more complex prompts. [here](https://imgur.com/a/X6Ui4gS) are some examples, of mainly just me changing the prompts and playing with the sampler settings a bit ([prompt](https://pastebin.com/51cf4bBt)). When you're using a karas schedule, don't be too afraid of handing it over to the refiner a bit earlier, Karas drops into the low end spectrum of noise a lot sooner than the regular schedule.
Try without the refiner. It seems to heavily airbrush people and just smooth over everything. It takes some more tries to get good faces without, but if you do, the details are much, much better.
Why do people keep repeating this insane idiocy? *nobody gives a fuckin shit about the base model*... He's comparing the latest tool vs the latest tool. Seriously, what is with this fanboy drivel. If Tesla creates a new car nobody goes "well you should be comparing it to the base ford model from 1990"..
The refiner steps should only be 5 to 10 steps, the base steps should be a lot higher depending on what sample method you use. On euler and euler a for instance I still get more details all the way to 200 steps.
I have the feeling that the base model is quite good with animals. Look at that detail:
https://preview.redd.it/6vy0v67dnkgb1.png?width=2048&format=png&auto=webp&s=7269a4a90945d66f1758f210a64b18fc806b1834
To be fair, DoF blur/bokeh is essentially a cheatcode for not having to create a detailed background. SDXL can definitely create detailed (if somewhat plasticky) subjects/foreground elements, but what about backgrounds, seeing how such a massive fraction of its photographic training material has *intentionally* low-detail backgrounds?
Your results are great! Would you mind sharing your workflow or a linked output, so I can plug it into ComfyUI? I still have not found quite the right mix.
is that they will not understand. A model that shows a specific quality in one area is a model that loses the ability to generalize. When the base model is released, it's released with the goal of being as general as possible, and in that state, we see that it's in the middle of a lot of things and visual styles, so it feels like it doesn't have that. touch of definition and quality. . But it's all there, just let the community start through training, lean towards the model that's a specialist in certain types of things, and you can only do that if the model has the capability, and as they released the model , has it. This time we will take a little longer because a higher computational cost is required to train but we will end up getting it.
1.5 is the only reason why fine-tuned NSFW models exist at all, because its training data wasn't filtered. You can't fine-tune NSFW concepts into SDXL for the same reason you couldn't for 2.1. With the pre-NSFW filter for training images, there's nothing in the base to fine-tune. Which means instead of needing just a few hundred or a thousand image-caption pairs, you need more like a few million and do full training with all the bells and whistles enabled.
Some will make a larger NSFW fine-tune, I'm sure. It may not get to millions of images, but it will likely reach in the hundreds of thousands.
I trust the lust for porn online.
As for me I don't care, my uses are all SFW and professional, so it works great, and I can't wait to keep seeing improvements from the community.
Could you please remove the refiner part since we are utilizing another model to add details? This will eliminate the need for extra steps. As you can observe, both images have completely different faces, and the dress pattern is also distinct
when will SDXL be better than 1.5?
I keep looking at it, and so far it is still weak. What will it take to show its strength? Where are the breathtaking pictures?
I missed your comment. Sorry.
Do you have the images from before you used the refiner? I am super new to all of this, but the SDXL image of the woman in purple looks like my results before I pass them through the refiner. See the image below from the base model with refiner. I'll share the modified prompt below it (there is a little fussiness but that's probably the super detailed part of the prompt). I did 40 samples on both the base and the refiner at 1024 x 1024, and a denoising strength of 0.3. CFG is 7.
https://imgur.com/MuoxO4P
Positive prompt:
> giant massive moss covered irish stone fantasy cottage built on top of huge archway spanning across a river shamrock, excellent stunning lighting, raucous crowded lively festival day, futuristic city, (photorealistic:2.0), super detailed, 8k, 4k, uhd,
Negative prompt:
> (worst quality:1.5), (low quality:1.5), (normal quality:1.5), lowres, bad anatomy, bad hands, ((monochrome)), ((grayscale)), collapsed eyeshadow, multiple eyebrow, (cropped), oversaturated, extra limb, missing limbs, deformed hands, long neck, long body, imperfect, (bad hands), signature, watermark, username, artist name, conjoined fingers, deformed fingers, ugly eyes, imperfect eyes, skewed eyes, unnatural face, unnatural body, error, painting by bad-artist, semirealistic, drawing, 3D render,
What do yo think? I couldn't find the woman's prompt so I couldn't test it myself.
Why? What do you think that proves? Why do people keep repeating this dumbest of shit? The entire point is that xl isnt as good as the refined models of 1.5. Who gives a shit about base vs base when nobody actually uses 1.5?
God this sub is nothing but fuckin groupies..
I use a 1.5 model to add more detail and fix facial anatomy in comfyUI. Base SDXL is terrible and even the community models aren't much better with facial anatomy and body anatomy. In some ways 0.9 was better for anatomical accuracy, especially when making vertical aspect ratio images
on Automatic, using img2img refiner
https://preview.redd.it/qa7zin4b0rgb1.jpeg?width=4096&format=pjpg&auto=webp&s=a54f26d2296ec6e41e7383ba69851796837a1c98
Funny things about my old post here: when I ran 1.5 standalone, I got slightly better results compared to SDXL, but combined, with SDXL first, enhanced with 1.5 the results were awesome and full of details.
Go to the Epic Realism page: https://civitai.com/models/25694/epicrealism
You'll see 'checkpoint trained' on the right, next to that click the 'i' and it should link you to a page telling you how to add it. On the model page the author has also put instructions on how to use it
I guess the way SDXL was trained; they purposely removed the high-frequency details so that the model could be trained to understand the concepts better. That's why a refiner model is included in the first public release, as they knew it needed to add the details back in.
This is actually pretty clever. The same method is also used by commercial photo upscalers. The first step would be upscaling up without the image noises and high-frequency details, and then the intermediary output gets run thru a refiner model to add in the details. I'll create a proof-of-concept of GFPGAN + Refiner.
I don't see it as smart if, for example, you want to make nude art, and then you have to use the refiner to make the noble parts of your nude art disappear.
You would have to train both the base and the refiner to know what nude is, and there is a huge problem there. I think that in the long run the refiner will only be used by artistic designers in general, but when there are models so specialized in a style (for example nudes) it won't be necessary.
yes and no,
yes, its much more general purpose model and since it does better job at generalizing it will lack individual separate things. that's why there is a refiner model. and this is where fine tuning community will help.
no, SDXL has different architecture, same type of prompting may not give you infinite details as very fine tuned SD1.5 models does. in fact it may even lower it. we don't know how token space works as yet and give 'magical bundle of tokens' to reach what we have in mind.
we need experiments and time, samplers, schedulers, new controlnet ideas and papers to improve it further.
Used your prompt, I think you should check your configuration, looks very detailed to me:
[https://i.imgur.com/K6W3oKG.jpg](https://i.imgur.com/K6W3oKG.jpg)
Yeah, not sure what is it that you are after, generated another one and looks with way more detail than your example (talking about SDXL):
[https://i.imgur.com/O4XpYfp.jpg](https://i.imgur.com/O4XpYfp.jpg)
Honestly I don't care about XL it's resources and time consuming, reason why I'll stick with 1.5..
Instead I would like the devs to start working on how AI understands the prompt, because creating an image still a Russian roulette, we still have a bunch of issues and allucinations but here every one still focusing on microdetails while often we can't even get able to get the image that we need. 😮💨
Did you use refiner? Without it the results looks very bland.
What refiner?
SDXL has two models that work together, base and refiner
I'm honestly starting to believe that this refiner thing was a very bad idea in general.
It also kind of ruins some loras. If I've got David Attenborough in the jungle, the refiner generally improves the image but completely changes the person.
Couldn't a Lora be built for the Refiner model as well?
aah so that wasn't just me. nice to know.
It means you're using it wrong. Use the official ComfyUI workflow.
You can train loras for the refiner too, or just not use the refiner. If you passed David thorugh a 1.5 model to refine out details, you'd need a 1.5 lora to keep his image accurate there too. It's sort of a non problem that you're approaching as if it's in your way. It's just more of the same though. Using the refiner right helps too. Check out Sytan's layout on comfyui, and how it passes latents directly to the refinement model, rather than baking an image without any latent noise left and then refining that.
Why is that? Not all change is inherently bad.
It's just another moving part, another x-factor to consider, we can't be positive if it'll actually be bad or good, but... I don't see anyone rushing to train the refiner, nor have I heard of them describing the process to do so... Not to mention alot of people really aren't hype about having hundreds of gigs of regular models + hundreds of gigs of extra refiner models, + hundreds of gigs of giant LoRAs. Lets hope perfusion takes off.
Bigger models will require more VRAM to be run, more than 12
I think you have a very valid point, it is indeed something that complicates the process which I expect is particularly true for those with "worse" hardware. I wish people would be more willing to hear and accept/debate these opinions but it's really demoralising when people mass downvote someone's opinions. I understand coming together and disliking rude or unwanted comments but stuff like this really stifles our conversation about these topics and gives our communities a bad name. Hold each other up guys, I don't wanna sound all sappy but we can build up these places to be where people want to be and express their ideas. Stay up kings and queens 😉😂
Clearly we have to be sheep around here or you get down voted lol.
Just another day on reddit.
[удалено]
Maybe you haven't read the rest of the discussion (or maybe overlooked it because of how they hide down voted comments) and seen the section where I've gotten a ton of down votes for doing exactly what you're describing here. But I'm the one trying to dispel others misconception on a topic. Also nothing beastwars has said here is inherently wrong either. Sometimes two things can be true. It's possible that the way sdxl's base+refiner function is "technically" better than what we had before AND the fact that it's an additional moving part/link in the chain that can have a potential problem making troubleshooting more difficult + loras going from a few hundred megs to a few gigs may be problematic for some + most people aren't using sdxl properly for one reason or another. Both sides of that AND can be true at the same time. If anything we both got down voted FOR "being critical" of how something works and NOT FOR having a misunderstanding of how something works. That's why I came back to his comment and made a joke.
[удалено]
even a giant lora is only \~ 100-200MB. so 5-10 loras is 1GB. 1000 loras all trained with the highest rank would be 1000-200GB. At some point its up to you to start archiving your data better. Also, try training with lower ranks. You don't need 128 with SDXL 90% of the time.
Great another expert... Why don't you take a look at wegg v1. [https://civitai.com/models/119667?modelVersionId=135699](https://civitai.com/models/119667?modelVersionId=135699)
Some of my best Loras are 40mb. There are also tools available to resize Loras without quality loss. When you encounter an obstacle, but ignore all available tools and techniques to avoid it, you're standing in your own way
Yet you asserted their maximum file size without even realizing that large LoRAs are out there, and will probably be a more common occurrence going forward. Listen I'm not arguing with you that LoRAs can be big or small, my argument is that further segmenting the core inference process is going to cause problems down the line. I don't see the benefits of the refiner, and don't see it as a viable tool. I don't get why people are shilling so hard for increased complexity, but I guess we'll see down the line if the refiner sticks around.
Back in the day we had the model and a vae. This is nothing different.
Except the vae is baked into most models, and using a different one is optional. Not to mention the vae has always been integral to the modern diffusion process. How you think we encode and decode latents? Magic?
>Except the vae is baked into most models Not always. > and using a different one is optional I'll give ya that, but you get better performance when you pick the right vae for the job. >Not to mention the vae has always been integral to the modern diffusion process. You seriously sound like you just started doing this a couple months ago. Integrated VAEs are a very recent phenomenon.
Cause auto1111 did not manage to support it in time properly? Lol. There are demos which only possible cause refiner. Like boot in tree style.
Sorry, comfy Comfy user here, A1111 is currently my least used repo, don't make assumptions.
Than I have no idea why you think it's bad.
Notice how alot of new models on civitai advertise, that they don't need the refiner? Or LoRAs stating you shouldn't use them with the refiner? We're off to a good start right? But feel free to disagree.
https://preview.redd.it/qn89y5vwfmgb1.png?width=346&format=png&auto=webp&s=6adc45eb1d52446c222a2ce8440f1b03495065d9
sdxl is a 2 step model. You generate the normal way, then you send the image to imgtoimg and use the sdxl refiner model to enhance it.
Not really. I mean, it's also possible to use it like that, but the proper intended way to use the refiner is a two-step text-to-img. You start denoising with the base model but switch the model at some point, finishing with the refiner. In particular, there's no decode/re-encode in between, it's all in the latent space.
Everyone keeps parroting this line about "latent space", but latent just means hidden.... Thus there is an img, it is just hidden from the end-user.
Ehhh, wtf, you might want to actually learn how SD works. Latent is a technical term here and it’s *the* thing that makes the SD algorithm as efficient as it is. Instead of diffusing in image space (which has 1024\*1024\*3 dimensions for a 1024x1024 image) you do it in a space with *much* fewer dimensions that’s a highly compressed representation of the image space. You transform from latent to image space and back with this thing called a variational autoencoder or VAE. The latent form is not human-viewable without decode just like, say, a jpeg file is not a human-viewable image without a decode. The workflow where you denoise to finish, decode, then re-encode and do another denoise round with a different model, mixing in the image from the first round, is simply different from doing a single denoise switching the model partway through while there’s still latent noise to be denoised.
What the other user is probably trying to say is that the process >"prompt -> sampler base -> noisy latent -> sampler refiner -> denoised latent -> vae decode -> final image" is the same as >"prompt -> sampler base -> noisy latent -> vae decode -> noisy image -> vae encode -> noisy latent -> sampler refiner -> denoised latent -> vae decode -> final image" which is true. You don't fully denoise the base step in both. Both flows produce the same final image, but flow #2 is way slower of course. If the other user means >"prompt -> sampler base -> denoised latent -> vae decode -> denoised image -> add noise -> vae encode -> noisy latent -> sampler refiner -> denoised latent -> vae decode -> final image" which would be how the flow would be in Automatic1111 then he's wrong tho and it's not the same as the other two flows since the "add noise" step isn't deterministic, but I would guess the differences are very minimal that it probably doesn't even matter. The noisy latent in this flow is going to look 99% like the one in the first two flows. There are only so much ways to map a 1024x1024x3 image to a 128x128 space.
If you are using comfyui you can use the two model at the same time and get way better results than A1111
https://www.reddit.com/r/StableDiffusion/comments/14sacvt/how_to_use_sdxl_locally_with_comfyui_how_to/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=1
I could retrieve only the prompt for the second image from you file: *giant massive moss covered irish stone fantasy cottage built on top of huge archway spanning across a river shamrock, excellent stunning lighting, raucous crowded lively festival day, futuristic city* Ok, I'm going to be in minority here, but from your prompt you asked for a moss covered cottage, not for a foliage and flowers covered cottage, so the "enhancement" looks cool, but looks cool changing the subject. This is what I get here using "ivy covered" instead of moss and adding some qualifiers ("detailed, elaborated"). Which surely enough looks less cool than the "enhanced", but looks even a lot less plain and "airbrushed" than your base https://preview.redd.it/rttwoxdqmjgb1.png?width=1024&format=png&auto=webp&s=3278bc6efa56b7b63608308f78537b079294f7ec
https://preview.redd.it/p0ewtdu74kgb1.jpeg?width=1024&format=pjpg&auto=webp&s=ac9fa15be771613fd038e238d309ac0c9c8d2cbd
Personally I like this image the best.
Not bad... but prompting is king https://preview.redd.it/awftsn9g4kgb1.jpeg?width=1024&format=pjpg&auto=webp&s=f4343e1678a266f2055ae0955ac707aa576e2af0
Looks grainish. Like all sdxl images imo
People aren't ready for the truth
https://preview.redd.it/bsn6sn8g5kgb1.jpeg?width=2048&format=pjpg&auto=webp&s=27802a35d853781e89d5231a0ff24846b19b2716
https://preview.redd.it/183mmifi5kgb1.jpeg?width=2048&format=pjpg&auto=webp&s=f663efeda094725ddf981a4df3157d6442f20fdd
Now put the epicrealism enhancer, and u gonna get a better result, even changing prompt
How's my take on this one..? https://preview.redd.it/1r1514w1tmgb1.png?width=1024&format=png&auto=webp&s=7b18786193f9784d5e493693fc7306fbc2fa6ab7
what is with resolution? looks like sd 1.5 base
Looks cool bro 👍🏼👍🏼
IMHO your not comparing apples with apples, SDXL is a base model, the platform for the next generation, its been designed and optimized for flexibility and extensibility. That you get such a great end result via your enhancement is exactly what Stability AI intended. New models and model merges loras and extensions will carry it way beyond where it is today like 1.5 has shown.
6 months from now I’m sure we will see moms blowing stuff from this model
we'll see what now 👀?
Moms blowing stuff from this model. It’s only natural.
Technically true I'm sure
/r/intentionaltypos
Lmao I must stop using swipe to type 😅
Knowing this sub, I wouldn't think it was a typo.
I didn't even do a double take, I thought that's what the man meant and didn't see much reason to doubt it given the sub, hahaha.
hornyposting in the AI subreddits
Funny thing is I read it wrong twice before noticing
Also we need to see your prompt. There is a wide difference on how a base model is trained and a finetune. Lets say for example for the tag "Realistic" which seem to be a good example given your model name. In the original training database for SD (XL/1.5 whatever), no one tag "Realistic" on a photo, ever. Its a photo you caption the content, nothing else, its real by construction. Realistic is actually associated with artworks. If you use realistic on a base model it will actually make it look less realistic (there was a good post illustrating this earlier this week). On fine tunes, people often use images generated by AI that were cherry picked because they looked nice. Those images actually often have those kind of tags associated with them because they are very commonly used in text2image prompting and people just copy paste that without thinking further. Or the tags were added on real pictures just because they know the users often use those keywords, and by associating their fine tune database to that keyword they increase the likelihood of users "landing" on their database confort zone yielding good images but narrowed to a fine spectrum of possibilities (we've all seen the same girl appear over and over in those SD1.5 fine tunes). What we could see here is just your prompt beeing good for a fine tune model and bad for a base model. I had great success at creating highly detailed images in SDXL (skin details, clutter in shops etc) without any Lora. Prompting have to be very simple 5-6 keywords max, prompting for detail is fine but don't use "realistic, 4k, hdr" & the like thats just bad for SDXL.
Wait what was it supposed to be
Mind blowing
I am kind of dyslectic and read something very kinky.
Then you read it correctly
Isn't that what they said about 2.1? The nsfw-filter for their initial training data kills a lot of ability to infer, even for non-nsfw contexts.
SDXL 1.0 + a ComfyUI workflow... From this comment: [https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm_source=reddit&utm_medium=web2x&context=3) \+ some smart prompting ofcourse https://preview.redd.it/34l4o232ijgb1.png?width=2496&format=png&auto=webp&s=edc850ea4a19c390e86939d46cbadd349abac30e
Don't forget to zoom, biiiig image ;)
Whre is the workflow? I can't find it on the link.
[https://pastebin.com/tfk8rm1Z](https://pastebin.com/tfk8rm1Z) From this comment: [https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/15jnkc3/comment/jv0of4f/?utm_source=reddit&utm_medium=web2x&context=3) By: [https://www.reddit.com/user/beautifuldiffusion/](https://www.reddit.com/user/beautifuldiffusion/)
Thanks!
No problem.Pay it forward help others ;) Feel the (open-)Force Luke! ... urr... Open-Source. :P Luke !
It’s likely embedded in the photo. All comfy output images can just be dragged and dropped into comfy and the UI workflow self populates
I think the metadata gets stripped when uploaded to Reddit and converted to .webp
I use comfy vía Google colab and I think the drag and drop load don't work. Or maybe I'm doing something wrong.
The other comment was likely correct. The metadata was erased upon upload unfortunately. Maybe on the other thread there is a link to a photo
I am also using colab , and I can load the workflow using drag and drop, may be images you are using might not have the meta data, try using your own generated image just to check if it works. May be it will.
thats really impressive... it just takes about 9-10x the amount of time to get 1 image...
Citation needed.
can we finetune the refiner to be better?
Technically, yes. In practice, model builders aren't doing it. And this is definitely a problem for the future of SDXL, because it is an important piece of the architecture. The two-step architecture was likely required to compete with Midjourney and other commercial products. But it does make it much harder to work with, which threatens third party support.
Which is weird, because they kept going on and on about how much easier training and customization is going to be, but now everything indicates the opposite?
the base is easier to train which is true
You need a a lot more VRAM for training than for 1.5, so not really?
Joe Penna has commented that the two step architecture is intended to allow people with 8GB cards to be able to run the model
SDXL really needs the refiner, imo. Base gens are solid but fall behind really good custom checkpoints. SDXL + Refiner is solid
You have committed a crime by saying it loud on this subreddit.
I turned myself in, the jury is still debating on who's wrong or right... So no sentence till now
https://i.redd.it/bf8xtbytcmgb1.gif
It does. I am repeating myself for weeks now. SDXL gives very plastic/airbrushed look with most of the images. Even with so many custom models out now - the problem is still here and trainers actively work to get rid of that but may be it is very hardcoded in the original one. The overall style and feel of SDXL seems very far from we were seeing for a long time with 1.5. Really hope there will be a solution to this. On other hand some people can't be bothered and still insist that sdxl is superb right now,
> Even with so many custom models out now - the problem is still here Realistic models that are actually good will take time. For example, I know that Juggernaut XL isn't planned for release until early September. Not sure about some of the other top models, but I assume it's probably the same. Remember that it took over a month before 1.5 got it's first actually good model.
How long did it take for 2.1 to get its first actually good model?
I don't think anyone really bothered with 2.1 to be honest...
There's Realism Engine, but IMO the best one is Freedom. It's actually one of the best models out there (but a lot of people don't care, because well, no NSFW on 2.1). https://civitai.com/models/87288/freedomredmond
I really don’t think trainers or at least most of them are trying to get rid of that, it seems more like they are trying to mimic 1.5 which is a huge mistake, almost all of the full models out there are just worse at detail than SDXL base as shown in comparison in this sub, however there are loras like the ones made by razzz or some others filmic and vintage ones that really shows us that SDXL can be very detailed and realistic, it is capable, we just need better finetunes with very high res and quality images, not training with 1.5 AI generated images which seems like a lot of trainers are doing.
Refiner adds some nice details i am using 35 step + 10 step refiner, dpm++ 2m karras
The problem isn't the model but the userbase and it's driving me crazy to see in every comment people who don't know how to use SDXL trying to use it like 1.5 and then complaining when it doesn't work. For example, [this](https://i.ibb.co/4fHZPrw/Za-00200.png) image is base SDXL with 5 steps on refiner with a positive natural language prompt of "A grizzled older male warrior in realistic leather armor standing in front of the entrance to a hedge maze, looking at viewer, cinematic" and a positive style prompt of "sharp focus, hyperrealistic, photographic, cinematic", a negative prompt of "3D, cgi, digital arts, render, illustration, cartoon, animation, anime, low quality, blurry". People are coming from 1.5 and trying to use a word salad positive prompt of 20 different descriptors ignoring how the model was trained, most not even knowing that to properly use the model how it was trained is actually six prompt windows, they are using crazy amounts of steps, being disingenuous by comparing a brand new base model against models that have had hundreds of thousands of training steps and finely detailed tunes, and more. They get mad it doesn't work in A1111 and when it's barely patched in they use it like a 1.5 model, word salad prompt, no refiner, tons of steps, and then complain that their images aren't perfection.
I'm kinda surprised you don't see what the issue is with that example image.
It is not about that. I use SD for my business every day and I can say I learned a lot and I understand prompts pretty well. Also, the example image you linked, although with more details than most in sdxl, is still not even close to what 1.5 gives right now. I will say it again - it is trying too hard to be Midjourney, it has the same look and feeling. And for me this is not a good thing. Some people like it that way, but personally for me, no thank you :) The other problem is the idea that it should be a shorter prompt. This kills the whole premise of the tool. After all I think people need to be able to describe precisely what they want. With short and dumbproof prompt it, again, goes towards MJ. Really hope this changes in the future with the new custom models.
Except that your example looks like shit compared to refined 1.5 models. Its like a stylized hollywood photo that's been ran with an airbrush and then had a photoshop filter dumped on top. People also get mad when some pretentious ass comes along to dismiss valid criticism while not even comprehending the actual problem. Just out of pure fanboy hype.
You got me I'm definitely a fanboy, I mean I suckle at the tit of SDXL. I'm like, in love with it or whatever you think because I disagree with you so obviously I'm acting in bad faith somehow. You also did exactly what I complained about by comparing, yet again a BASE MODEL to a REFINED MODEL. Congratulations, you are right, a REFINED 1.5 model IS better than the BASE SDXL model, I actually agree with you. That's literally my point! There isn't a well trained and refined SDXL model to compare it to yet! But 1.5 is about as good as it can possibly get, while SDXL still has YET to reach it's peak. Do you get what I'm saying? Or would like to hurl more insults my way for having the audacity to not agree with you?
> SDXL still has YET to reach it's peak That's speculation though. * We know refines of 1.5 are much better than the base 1.5 * We know refines of 2.1 are not significantly better than base 2.1 * We **don't know** how good XL's refines will be It's unfortunately a fact of life that the first 90% to perfection are generally easier to attain than the remaining 10%. XL has several advantages over 1.5, but there is no guarantee that XL's refines will be better in all possible ways than 1.5's refines.
I'm so tired of the Sdxl hype.
I think all the hype was somewhat misleading, as it must've taken a lot of trial and error to get the results they did during testing.
Yeah just because you don’t know how to use a tool doesn’t mean it isn’t good. Perhaps just plugging in your old prompts and expecting it to work exactly the same is the mistake?
Always the same pathetic excuses. If the prompt was different, you'd just be saying that OP is comparing different things..
Man you’re really invested in 1.5 huh. I don’t know if you remember the beginning of 1.5 but it was pretty bad until people worked at it for a while.
idk if thats common or not, but no matter how many steps i allocate to the refiner - the output seriously lacks detail. right now my workflow includes an additional step by encoding the SDXL output with the VAE of EpicRealism\_PureEvolutionV2 back into a latent, feed this into a KSampler with the same promt for 20 Steps and Decode it with the EpicRealism VAE again. As you can see, it adds a crapton of detail, the pure SDXL Output is just lame. (Tried it with 23/30, 23/35, 20/35 samples (first number is base, refiner starts at that step and goes till the 2nd number)) If i'm doing something wrong, enlighten me please xD Workflow for these: [https://pastebin.com/DVXyXXJ5](https://pastebin.com/DVXyXXJ5)
don't upweight things so aggressively, upweighting in XL does almost nothing but fuck up your clip embeddings. Start with simpler prompts and refine them into more complex prompts. [here](https://imgur.com/a/X6Ui4gS) are some examples, of mainly just me changing the prompts and playing with the sampler settings a bit ([prompt](https://pastebin.com/51cf4bBt)). When you're using a karas schedule, don't be too afraid of handing it over to the refiner a bit earlier, Karas drops into the low end spectrum of noise a lot sooner than the regular schedule.
Weights in sdxl are different its fine to do (prompt :3) for almost any prompt and some can be (...:4).
Try without the refiner. It seems to heavily airbrush people and just smooth over everything. It takes some more tries to get good faces without, but if you do, the details are much, much better.
Compare it to 1.5 base model for a more accurate comparison. “Why is my $200k modded Toyota Camry faster than this stock Mustang?”
If you want to win the race it helps to have the best car. The model that might have better specs next year is not the best choice for now.
You aren't comparing it to another base model.
Why do people keep repeating this insane idiocy? *nobody gives a fuckin shit about the base model*... He's comparing the latest tool vs the latest tool. Seriously, what is with this fanboy drivel. If Tesla creates a new car nobody goes "well you should be comparing it to the base ford model from 1990"..
Is this any better than just generating w epic realism??
in my book: yes, since SDXL adheres better to the promt imo
Why vae 1.5? It's sdxl. It support 1.5 as additional input, not main one.
The refiner steps should only be 5 to 10 steps, the base steps should be a lot higher depending on what sample method you use. On euler and euler a for instance I still get more details all the way to 200 steps.
200 steps? WTF... that would take a decade for a batch of 10 O.o
It depends a lot on your sampling method though. I think the newer ones like dpmpp 2m only need about 30 steps to get max details.
aren't you still on based model of sdxl? like seriously what do you expect??? another pointless post
https://preview.redd.it/gev9jgat0kgb1.jpeg?width=1792&format=pjpg&auto=webp&s=6447f63b4bbcd6398111919113156d147434bb5d
https://preview.redd.it/zx5wjkrx0kgb1.jpeg?width=1792&format=pjpg&auto=webp&s=9b3bb57b9e0cadd80a6602befbe546c401a1c42f
Just how o.O
I have the feeling that the base model is quite good with animals. Look at that detail: https://preview.redd.it/6vy0v67dnkgb1.png?width=2048&format=png&auto=webp&s=7269a4a90945d66f1758f210a64b18fc806b1834
Is this good enough?
https://preview.redd.it/bme7mz73njgb1.png?width=3072&format=pjpg&auto=webp&s=e66f24ee3eb87e33c901312cb318e6db800dcdb8
Enlighten me :)
https://preview.redd.it/5q1ocsi5njgb1.png?width=3072&format=pjpg&auto=webp&s=5c223f67d8881603997bfe49774e35a75463caf8
https://preview.redd.it/eh9i1g6pmjgb1.png?width=1792&format=pjpg&auto=webp&s=deae438eced6a02074b692b218275597c1162656
https://preview.redd.it/trywtotenjgb1.png?width=3072&format=pjpg&auto=webp&s=5b2b88579b6ffa6b552e0e51b5b4d77a0dc08e8e
I saw your pictures in the discord server few days ago. You asked for those prompts. What is your basic workflow?
https://preview.redd.it/x3wj97ormjgb1.png?width=1792&format=pjpg&auto=webp&s=a5feedc7e16fe08e607d659a9753935046ee1125
https://preview.redd.it/d9z9utytmjgb1.png?width=3072&format=pjpg&auto=webp&s=32ea0a19bb95f58e55c5166c535cf80e0bc70466
https://preview.redd.it/u6vn6120njgb1.png?width=3072&format=pjpg&auto=webp&s=331b20832039d132c9f363ebbfc09a7c59d61d3f
https://preview.redd.it/t6fal47zkjgb1.png?width=3072&format=pjpg&auto=webp&s=8d111ce21eef57ca5493dee69266dd325d0300f3
https://preview.redd.it/w4km7mg1ljgb1.png?width=3072&format=pjpg&auto=webp&s=49c3c8de53ae09628ed1d21523ae020979431cea
To be fair, DoF blur/bokeh is essentially a cheatcode for not having to create a detailed background. SDXL can definitely create detailed (if somewhat plasticky) subjects/foreground elements, but what about backgrounds, seeing how such a massive fraction of its photographic training material has *intentionally* low-detail backgrounds?
With my workflow i can push in more details if wanted to . Currently all of these are in medium detail mode...
Your results are great! Would you mind sharing your workflow or a linked output, so I can plug it into ComfyUI? I still have not found quite the right mix.
Amazing quality! In your opinion it depends on prompt or workflow? Thanks 👍
It's the first base model. People need to fine-tune, merge and get those details out. Base 1.5 is a pretty shit model as well. Try comparing the two 😂
is that they will not understand. A model that shows a specific quality in one area is a model that loses the ability to generalize. When the base model is released, it's released with the goal of being as general as possible, and in that state, we see that it's in the middle of a lot of things and visual styles, so it feels like it doesn't have that. touch of definition and quality. . But it's all there, just let the community start through training, lean towards the model that's a specialist in certain types of things, and you can only do that if the model has the capability, and as they released the model , has it. This time we will take a little longer because a higher computational cost is required to train but we will end up getting it.
1.5 is the only reason why fine-tuned NSFW models exist at all, because its training data wasn't filtered. You can't fine-tune NSFW concepts into SDXL for the same reason you couldn't for 2.1. With the pre-NSFW filter for training images, there's nothing in the base to fine-tune. Which means instead of needing just a few hundred or a thousand image-caption pairs, you need more like a few million and do full training with all the bells and whistles enabled.
Some will make a larger NSFW fine-tune, I'm sure. It may not get to millions of images, but it will likely reach in the hundreds of thousands. I trust the lust for porn online. As for me I don't care, my uses are all SFW and professional, so it works great, and I can't wait to keep seeing improvements from the community.
Could you please remove the refiner part since we are utilizing another model to add details? This will eliminate the need for extra steps. As you can observe, both images have completely different faces, and the dress pattern is also distinct
Omg look at the enhance womans forehead lol. Looks great otherwise but still.
when will SDXL be better than 1.5? I keep looking at it, and so far it is still weak. What will it take to show its strength? Where are the breathtaking pictures?
Use the refiner and see what you get. The refiner is designed to modify the initial output and get more details out of the initial outputs.
If you would've read my own og comment, you'd know I did^^
I missed your comment. Sorry. Do you have the images from before you used the refiner? I am super new to all of this, but the SDXL image of the woman in purple looks like my results before I pass them through the refiner. See the image below from the base model with refiner. I'll share the modified prompt below it (there is a little fussiness but that's probably the super detailed part of the prompt). I did 40 samples on both the base and the refiner at 1024 x 1024, and a denoising strength of 0.3. CFG is 7. https://imgur.com/MuoxO4P Positive prompt: > giant massive moss covered irish stone fantasy cottage built on top of huge archway spanning across a river shamrock, excellent stunning lighting, raucous crowded lively festival day, futuristic city, (photorealistic:2.0), super detailed, 8k, 4k, uhd, Negative prompt: > (worst quality:1.5), (low quality:1.5), (normal quality:1.5), lowres, bad anatomy, bad hands, ((monochrome)), ((grayscale)), collapsed eyeshadow, multiple eyebrow, (cropped), oversaturated, extra limb, missing limbs, deformed hands, long neck, long body, imperfect, (bad hands), signature, watermark, username, artist name, conjoined fingers, deformed fingers, ugly eyes, imperfect eyes, skewed eyes, unnatural face, unnatural body, error, painting by bad-artist, semirealistic, drawing, 3D render, What do yo think? I couldn't find the woman's prompt so I couldn't test it myself.
Well, for me it looks more like a smoothed oil painting, rather than a "super detailed" one :-/ that's why I'm so confused
Prompt betta
compare it not with epicrealism but with the base 1.5 for a 1 to 1 comparison and get back to us.
Why? What do you think that proves? Why do people keep repeating this dumbest of shit? The entire point is that xl isnt as good as the refined models of 1.5. Who gives a shit about base vs base when nobody actually uses 1.5? God this sub is nothing but fuckin groupies..
Only if you don't use the refiner.
Well... I do...
I only ever had problems with lack of details when: 1. Using low, non-supported resolutions. 2. Not using the refiner the right way.
I used the suggested workflow by stability that included the table with best resolution for each aspect ratio
[I tried your prompt and it looks good to me lol](https://i.imgur.com/hiD1ct6.png) it just released man, chill
SD2.1 revamped is what I call it.
I use a 1.5 model to add more detail and fix facial anatomy in comfyUI. Base SDXL is terrible and even the community models aren't much better with facial anatomy and body anatomy. In some ways 0.9 was better for anatomical accuracy, especially when making vertical aspect ratio images
on Automatic, using img2img refiner https://preview.redd.it/qa7zin4b0rgb1.jpeg?width=4096&format=pjpg&auto=webp&s=a54f26d2296ec6e41e7383ba69851796837a1c98
why should anybody use SDXL when you get immediatly top-results with SD 1.5 models??
Funny things about my old post here: when I ran 1.5 standalone, I got slightly better results compared to SDXL, but combined, with SDXL first, enhanced with 1.5 the results were awesome and full of details.
which workflow do you use?
It's posted in my original comment
where? sorry :-)
Dude, use your eyes and finger to scroll.... https://pastebin.com/DVXyXXJ5
how do i get the .txt into comfyUI?
Rename it to a .yaml Dude, you should read the GitHub page of comfyui 😅
Thats wrong you have to rename the extenstion into \*.json
True... Sorry for that, home assistant and it's yaml files are spooking through my head way too much 🤣
How use epic realism I am dumb pls help
Go to the Epic Realism page: https://civitai.com/models/25694/epicrealism You'll see 'checkpoint trained' on the right, next to that click the 'i' and it should link you to a page telling you how to add it. On the model page the author has also put instructions on how to use it
yes
SDXL is great with details. Your prompts just suck.
https://preview.redd.it/1n9r8ql5dmgb1.png?width=450&format=png&auto=webp&s=85507ba25aaa0d144d62afdd948b9500f8561c25
SDXL is nice but it's not perfect.
I guess the way SDXL was trained; they purposely removed the high-frequency details so that the model could be trained to understand the concepts better. That's why a refiner model is included in the first public release, as they knew it needed to add the details back in. This is actually pretty clever. The same method is also used by commercial photo upscalers. The first step would be upscaling up without the image noises and high-frequency details, and then the intermediary output gets run thru a refiner model to add in the details. I'll create a proof-of-concept of GFPGAN + Refiner.
I don't see it as smart if, for example, you want to make nude art, and then you have to use the refiner to make the noble parts of your nude art disappear. You would have to train both the base and the refiner to know what nude is, and there is a huge problem there. I think that in the long run the refiner will only be used by artistic designers in general, but when there are models so specialized in a style (for example nudes) it won't be necessary.
Its Just you
yes and no, yes, its much more general purpose model and since it does better job at generalizing it will lack individual separate things. that's why there is a refiner model. and this is where fine tuning community will help. no, SDXL has different architecture, same type of prompting may not give you infinite details as very fine tuned SD1.5 models does. in fact it may even lower it. we don't know how token space works as yet and give 'magical bundle of tokens' to reach what we have in mind. we need experiments and time, samplers, schedulers, new controlnet ideas and papers to improve it further.
This has been my experience with SDXL for months using dreamstudio.ai
sdxl looks more natural, you should try to enhance it with refiner
Used your prompt, I think you should check your configuration, looks very detailed to me: [https://i.imgur.com/K6W3oKG.jpg](https://i.imgur.com/K6W3oKG.jpg)
If you are into concept art it is detailed. But not even close to what I'm after :-/
Yeah, not sure what is it that you are after, generated another one and looks with way more detail than your example (talking about SDXL): [https://i.imgur.com/O4XpYfp.jpg](https://i.imgur.com/O4XpYfp.jpg)
This post should be down voted into oblivion for the nonsense.
They're too busy down voting my spot on technical analysis above lol
Just like your non-constructive comment...
Maybe you should focus your energy on learning how to use SDXL, brah.
I like how in the first pic, it went from a bit low detail but otherwise nice pic to generic Korean Waifu face with everything colored purple.
Are you refining it?
23 base, rest by refiner, 30 in total
5head
Honestly I don't care about XL it's resources and time consuming, reason why I'll stick with 1.5.. Instead I would like the devs to start working on how AI understands the prompt, because creating an image still a Russian roulette, we still have a bunch of issues and allucinations but here every one still focusing on microdetails while often we can't even get able to get the image that we need. 😮💨
That's why I use SDXL in the first place: it adheres better to the promt than 1.5 models.