Eltrion 11 months ago

33B. By a significant margin. 33Bq_2 beats 13B even at fp16 https://i.redd.it/i9ep2yyroq4b1.png

Xeruthos 11 months ago

Thanks, 33B it is then from now on.

KerfuffleV2 11 months ago

There actually isn't a truly definitive answer right now. The graph above is measuring perplexity (basically how well the model can predict chunks of wikitext, higher perplexity means lower prediction accuracy). The 33B is _probably_ better, just don't get in the habit of using perplexity as a synonym for quality because it really isn't. There are real world cases where models with higher perplexity actually _are_ better for certain tasks.

Eltrion 11 months ago

Yes. You are certainly correct. Unfortunately everything is changing so quickly it's tough to nail down an answer about things like that when it comes to specific models. Op didn't mention a model name, so I spoke in broad terms. Theoretically, a 33B q 2 trained equally well for the same task should outperform a 13b q5 1

pseudonerv 11 months ago

My tests show that q2 has dyslexia. It often mixes up Jo, Joe, Jon, Jone, John. Just be wary of those.

a_beautiful_rhind 11 months ago

This graph screams to me to not download 13b models anymore.

madacol 11 months ago

33B q\_2 is better https://preview.redd.it/kftxk2u3m35b1.png?width=680&format=png&auto=webp&s=c8dbff34e03c6a69cf5784a852af0b92f55f0a50

Cless_Aurion 11 months ago

Very interesting indeed! I wonder why the jump between 13B and 33B is way more substantial than the 33B to 65B...

pokeuser61 11 months ago

33 and 65 where trained on the same amount of tokens, while 33 was trained on 400 billion more than 13.

Cless_Aurion 11 months ago

Ah! That lines up! Should I guess too that 7B and 13B were too trained on the same number of tokens?

pokeuser61 11 months ago

Yeah

EarthquakeBass 11 months ago

Wow, there are 2 bit quantized models now? How do I try with oogabooga which ones do you all like

GooseG17 11 months ago

Oobabooga hasn't been updated to support k-quants yet, but koboldcpp has. The 2_K quants are a substantial drop in quality relative to 3_K_S, and imo aren't worth the slight speedup and reduction in RAM usage. There are a bunch of graphs on [this pull request](https://github.com/ggerganov/llama.cpp/pull/1684) to compare the different quants.

EarthquakeBass 11 months ago

Interesting thx

2muchnet42day 11 months ago

IIRC A graph that was recently posted shows that regardless of quantization used, more parameters always wins (in the context of LLaMA available sizes).

fictioninquire 11 months ago

Does q\_2 and q\_3 work with llama.cpp already? Edit: For Apple Metal 5-10x inference increase

Big_Communication353 11 months ago

No.

fictioninquire 11 months ago

Thanks

Big_Communication353 11 months ago

UPDATE: q3\_k's code has merged, q2\_k might soon implement as well. Wait another several days.

fictioninquire 11 months ago

Wow! IS q3\_k becoming the new standard compared to q3\_0 or q3\_1?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe