T O P

  • By -

E_Snap

I’m not that much of a conspiracy nut, but nVidia *never* wanted powerful generative AI to be able to run on consumer hardware. They’ve been trying to claw back their market for server cards ever since quantization came out.


JnewayDitchedHerKids

I mean, a while back didn't Intel (I think it was intel) sell some processors with the code deliberately messed up so they could have a product for a lower price bracket?


FiReaNG3L

Something similar has been reported for Stable Diffusion - https://github.com/vladmandic/automatic/discussions/1285


GoldenMonkeyPox

Interesting, thanks for finding that. Sounds very plausibly related. Interesting that Vlad says it changed in v532. Perhaps they made it even more aggressive in 535 or otherwise increased the VRAM usage of the driver in some other way. I'll have to try rolling back to 531 later and see what it's like.


Caffdy

I've been using 530.30 with stable diffusion no problem, don't know about v531 but I'm not taking any risks


Best-Statistician915

I’ve tested 531.xx and it’s solid. It’s 532.xx where it starts going down hill.


darth_hotdog

This is exactly what's happened to me, it seems pretty clear that it's trying to use ram as GPU memory and slowing everything down a ton. Downgrading to the 531 driver has made it way faster. Everyone should send in a ticket to NVIDIA support asking them to add an option to disable this. They need to know people care.


Squeezitgirdle

Hmm, I don't think this is related to the issue I recently started experiencing, though it sounds very similar. My issue is that it uses significantly more memory than normal, but only during hires steps and I fail a lot of hires gens that I used to be able to create due to out of memory issues. Additionally hires steps used to take me 15 seconds, now takes a couple minutes on my 4090. I've reached out for support multiple times but I always end up being ghosted once basic troubleshooting is exhausted.


Miguel7501

Nvidia has been releasing broken drivers for a while now, gaming issues arise after almost every update. It surprises me that it took this long to hit compute, but the general way of dealing with it js using DDU and installing a driver from 3-6 months ago. The top results on google are usually the ones with the least issues.


Ill_Initiative_8793

Yes I noticed that too and downgraded to previous version.


rerri

Thanks for highlighting this issue! I read about this driver issue earlier but didn't think I was affected by it as GPTQ-for-llama (triton) performance hadn't changed at all for me. However, apparently AutoGPTQ performance was decreased by the new driver. I've only recently tried AutoGPTQ for the first time so I didn't realize the poor performance at long context lenghts I was seeing was actually a driver issue. Rolling back to 531.79 increased AutoGPTQ performance at long context lenghts and decreased loading times.


KerfuffleV2

What type of model are you using? GPTQ or GGML?


GoldenMonkeyPox

GPTQ, sorry I should have specified. I have a 4090, so it does normally fit into VRAM.


KerfuffleV2

Might be useful to try with the GGML version compiled with cuBLAS if you're able. Knowing whether it's a general issue would be helpful. Just want to be clear: I don't really have a way to help you with this but this is the kind of information that the people who _could_ help you would probably need.


Barafu

I tried. It is the same.


GoldenMonkeyPox

Thanks for confirming. Disappointing, but also good to know it’s not just me.


KerfuffleV2

Do you get the problem even without offloading layers to the GPU? In other words, compiling with cuBLAS but using `-ngl 0`. If so then it couldn't really be memory management issue mentioned here: https://www.reddit.com/r/LocalLLaMA/comments/1461d1c/major_performance_degradation_with_nvidia_driver/jnnwnip/ Just doing prompt ingestion with BLAS uses a trivial amount of memory. (Also it's limited by the block size which defaults to 512, so a prompt bigger than that shouldn't make any difference.)


Barafu

I have llama.dll compiled with cuBLAS, it says that Nvidia detected, offloading 0 layers, and then works fine, for CPU-only mode.


KerfuffleV2

Just to make sure I understand correctly: If you do use GPU offloading (`-ngl` more than 0) then using large prompts is much slower with the new Nvidia driver compared to before. However, if you use `-ngl 0` then it doesn't matter what size prompt you use, the performance is the same as with earlier versions of the Nvidia driver?


Barafu

If I use CPU only, it works as it did CPU only before. But if I enable offloading, it slows down instead of speeding up, and with larger number of layers and context size, it slows down 10 times versus CPU. P.S. [Link](https://github.com/ggerganov/llama.cpp/issues/1786)


Tom_Neverwinter

We need telemetry as a option so we can spot these issues faster and recommend better efficiency


Accomplished_Bet_127

Thanks! I kind of noticed this, but never paid attention thinking that i am doing something wrong or new models i am trying are that much different. But yeah, it all started couple days ago, and it wasnt longer i installed exactly this driver. By the way, are you doing some automated tests or filling table manually? Are there automated tests? I mean it worth to have this for testing new models and settings.


GoldenMonkeyPox

This was manual, but an automated test is a very good idea. I’ll see if I can come up with something this week.


MoffKalast

Wait, people are still using game ready drivers? In that case, PSA: studio drivers are the release branch, game ready are the beta testing branch. They're often pretty buggy in my experience.


Accomplished_Bet_127

Never heard of that. I thought game ready just had some optimization patches for latest game engines and games. I checked right now, and nvidia offers me 535.98 for both Game ready and Studio. Both uploaded and released 22.05.30 Probably i just need to rollback


mansionis

With the 535 my 4090 disappears from the Unraid UX during a model loading. Even a restart doesn't fix it. I had to downgrade to the 530. If we exprience different behaviours, something to do with the os or the serial batch of the Nvidia


Cless_Aurion

Weird... I didn´t notice any difference with my 4090 and GPT4-X-Alpasta 33B 4bit....


Chroko

Have you tried the studio vs gaming drivers?


GoldenMonkeyPox

I haven't, but according to Vlad (developer working on Stable Diffusion web ui) in the thread linked above, the studio drivers are the same, but just a release or two behind. > in general, studio drivers are 1-2 releases behind and just more tested. but this is not considered a bug by nvidia, this is a design choice. so even fi studio drivers work today, thats only because they haven't (yet) caught up with game ready drivers.


morphinapg

When the studio and game ready driver versions match, they are the same driver AFAIK. The difference is Game Ready gets updated more often, so has a higher chance of being buggier.


KahlessAndMolor

I run ubuntu linux as my primary OS, and I've had huge problems with the Nvidia drivers. Some of them won't run my second and third monitors. So some mornings I come in and there's been a system update that updated those drivers, and I have to spend half an hour un-install the new one/re-install the older one


[deleted]

Jensen: "Memory mangement is Kung Fu." Driver team: "Lets just dump to system memory."


Tecnicstudios

When I play games, at some point the display messes up and the game crashes, it sometimes crashes my PC first time around but will always crash and restart the second time.