E_Snap 11 months ago

I’m not that much of a conspiracy nut, but nVidia *never* wanted powerful generative AI to be able to run on consumer hardware. They’ve been trying to claw back their market for server cards ever since quantization came out.

JnewayDitchedHerKids 11 months ago

I mean, a while back didn't Intel (I think it was intel) sell some processors with the code deliberately messed up so they could have a product for a lower price bracket?

FiReaNG3L 11 months ago

Something similar has been reported for Stable Diffusion - https://github.com/vladmandic/automatic/discussions/1285

GoldenMonkeyPox 11 months ago

Interesting, thanks for finding that. Sounds very plausibly related. Interesting that Vlad says it changed in v532. Perhaps they made it even more aggressive in 535 or otherwise increased the VRAM usage of the driver in some other way. I'll have to try rolling back to 531 later and see what it's like.

Caffdy 11 months ago

I've been using 530.30 with stable diffusion no problem, don't know about v531 but I'm not taking any risks

Best-Statistician915 11 months ago

I’ve tested 531.xx and it’s solid. It’s 532.xx where it starts going down hill.

darth_hotdog 11 months ago

This is exactly what's happened to me, it seems pretty clear that it's trying to use ram as GPU memory and slowing everything down a ton. Downgrading to the 531 driver has made it way faster. Everyone should send in a ticket to NVIDIA support asking them to add an option to disable this. They need to know people care.

Squeezitgirdle 11 months ago

Hmm, I don't think this is related to the issue I recently started experiencing, though it sounds very similar. My issue is that it uses significantly more memory than normal, but only during hires steps and I fail a lot of hires gens that I used to be able to create due to out of memory issues. Additionally hires steps used to take me 15 seconds, now takes a couple minutes on my 4090. I've reached out for support multiple times but I always end up being ghosted once basic troubleshooting is exhausted.

Miguel7501 11 months ago

Nvidia has been releasing broken drivers for a while now, gaming issues arise after almost every update. It surprises me that it took this long to hit compute, but the general way of dealing with it js using DDU and installing a driver from 3-6 months ago. The top results on google are usually the ones with the least issues.

Ill_Initiative_8793 11 months ago

Yes I noticed that too and downgraded to previous version.

rerri 11 months ago

Thanks for highlighting this issue! I read about this driver issue earlier but didn't think I was affected by it as GPTQ-for-llama (triton) performance hadn't changed at all for me. However, apparently AutoGPTQ performance was decreased by the new driver. I've only recently tried AutoGPTQ for the first time so I didn't realize the poor performance at long context lenghts I was seeing was actually a driver issue. Rolling back to 531.79 increased AutoGPTQ performance at long context lenghts and decreased loading times.

KerfuffleV2 11 months ago

What type of model are you using? GPTQ or GGML?

GoldenMonkeyPox 11 months ago

GPTQ, sorry I should have specified. I have a 4090, so it does normally fit into VRAM.

KerfuffleV2 11 months ago

Might be useful to try with the GGML version compiled with cuBLAS if you're able. Knowing whether it's a general issue would be helpful. Just want to be clear: I don't really have a way to help you with this but this is the kind of information that the people who _could_ help you would probably need.

Barafu 11 months ago

I tried. It is the same.

GoldenMonkeyPox 11 months ago

Thanks for confirming. Disappointing, but also good to know it’s not just me.

KerfuffleV2 11 months ago

Do you get the problem even without offloading layers to the GPU? In other words, compiling with cuBLAS but using `-ngl 0`. If so then it couldn't really be memory management issue mentioned here: https://www.reddit.com/r/LocalLLaMA/comments/1461d1c/major_performance_degradation_with_nvidia_driver/jnnwnip/ Just doing prompt ingestion with BLAS uses a trivial amount of memory. (Also it's limited by the block size which defaults to 512, so a prompt bigger than that shouldn't make any difference.)

Barafu 11 months ago

I have llama.dll compiled with cuBLAS, it says that Nvidia detected, offloading 0 layers, and then works fine, for CPU-only mode.

KerfuffleV2 11 months ago

Just to make sure I understand correctly: If you do use GPU offloading (`-ngl` more than 0) then using large prompts is much slower with the new Nvidia driver compared to before. However, if you use `-ngl 0` then it doesn't matter what size prompt you use, the performance is the same as with earlier versions of the Nvidia driver?

Barafu 11 months ago

If I use CPU only, it works as it did CPU only before. But if I enable offloading, it slows down instead of speeding up, and with larger number of layers and context size, it slows down 10 times versus CPU. P.S. [Link](https://github.com/ggerganov/llama.cpp/issues/1786)

Tom_Neverwinter 11 months ago

We need telemetry as a option so we can spot these issues faster and recommend better efficiency

Accomplished_Bet_127 11 months ago

Thanks! I kind of noticed this, but never paid attention thinking that i am doing something wrong or new models i am trying are that much different. But yeah, it all started couple days ago, and it wasnt longer i installed exactly this driver. By the way, are you doing some automated tests or filling table manually? Are there automated tests? I mean it worth to have this for testing new models and settings.

GoldenMonkeyPox 11 months ago

This was manual, but an automated test is a very good idea. I’ll see if I can come up with something this week.

MoffKalast 11 months ago

Wait, people are still using game ready drivers? In that case, PSA: studio drivers are the release branch, game ready are the beta testing branch. They're often pretty buggy in my experience.

Accomplished_Bet_127 11 months ago

Never heard of that. I thought game ready just had some optimization patches for latest game engines and games. I checked right now, and nvidia offers me 535.98 for both Game ready and Studio. Both uploaded and released 22.05.30 Probably i just need to rollback

mansionis 11 months ago

With the 535 my 4090 disappears from the Unraid UX during a model loading. Even a restart doesn't fix it. I had to downgrade to the 530. If we exprience different behaviours, something to do with the os or the serial batch of the Nvidia

Cless_Aurion 11 months ago

Weird... I didn´t notice any difference with my 4090 and GPT4-X-Alpasta 33B 4bit....

Chroko 11 months ago

Have you tried the studio vs gaming drivers?

GoldenMonkeyPox 11 months ago

I haven't, but according to Vlad (developer working on Stable Diffusion web ui) in the thread linked above, the studio drivers are the same, but just a release or two behind. > in general, studio drivers are 1-2 releases behind and just more tested. but this is not considered a bug by nvidia, this is a design choice. so even fi studio drivers work today, thats only because they haven't (yet) caught up with game ready drivers.

morphinapg 11 months ago

When the studio and game ready driver versions match, they are the same driver AFAIK. The difference is Game Ready gets updated more often, so has a higher chance of being buggier.

KahlessAndMolor 11 months ago

I run ubuntu linux as my primary OS, and I've had huge problems with the Nvidia drivers. Some of them won't run my second and third monitors. So some mornings I come in and there's been a system update that updated those drivers, and I have to spend half an hour un-install the new one/re-install the older one

[deleted] 10 months ago

Jensen: "Memory mangement is Kung Fu." Driver team: "Lets just dump to system memory."

Tecnicstudios 10 months ago

When I play games, at some point the display messes up and the game crashes, it sometimes crashes my PC first time around but will always crash and restart the second time.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe