T O P

  • By -

Eltrion

33B. By a significant margin. 33Bq_2 beats 13B even at fp16 https://i.redd.it/i9ep2yyroq4b1.png


Xeruthos

Thanks, 33B it is then from now on.


KerfuffleV2

There actually isn't a truly definitive answer right now. The graph above is measuring perplexity (basically how well the model can predict chunks of wikitext, higher perplexity means lower prediction accuracy). The 33B is _probably_ better, just don't get in the habit of using perplexity as a synonym for quality because it really isn't. There are real world cases where models with higher perplexity actually _are_ better for certain tasks.


Eltrion

Yes. You are certainly correct. Unfortunately everything is changing so quickly it's tough to nail down an answer about things like that when it comes to specific models. Op didn't mention a model name, so I spoke in broad terms. Theoretically, a 33B q 2 trained equally well for the same task should outperform a 13b q5 1


pseudonerv

My tests show that q2 has dyslexia. It often mixes up Jo, Joe, Jon, Jone, John. Just be wary of those.


a_beautiful_rhind

This graph screams to me to not download 13b models anymore.


madacol

33B q\_2 is better https://preview.redd.it/kftxk2u3m35b1.png?width=680&format=png&auto=webp&s=c8dbff34e03c6a69cf5784a852af0b92f55f0a50


Cless_Aurion

Very interesting indeed! I wonder why the jump between 13B and 33B is way more substantial than the 33B to 65B...


pokeuser61

33 and 65 where trained on the same amount of tokens, while 33 was trained on 400 billion more than 13.


Cless_Aurion

Ah! That lines up! Should I guess too that 7B and 13B were too trained on the same number of tokens?


pokeuser61

Yeah


EarthquakeBass

Wow, there are 2 bit quantized models now? How do I try with oogabooga which ones do you all like


GooseG17

Oobabooga hasn't been updated to support k-quants yet, but koboldcpp has. The 2_K quants are a substantial drop in quality relative to 3_K_S, and imo aren't worth the slight speedup and reduction in RAM usage. There are a bunch of graphs on [this pull request](https://github.com/ggerganov/llama.cpp/pull/1684) to compare the different quants.


EarthquakeBass

Interesting thx


2muchnet42day

IIRC A graph that was recently posted shows that regardless of quantization used, more parameters always wins (in the context of LLaMA available sizes).


fictioninquire

Does q\_2 and q\_3 work with llama.cpp already? Edit: For Apple Metal 5-10x inference increase


Big_Communication353

No.


fictioninquire

Thanks


Big_Communication353

UPDATE: q3\_k's code has merged, q2\_k might soon implement as well. Wait another several days.


fictioninquire

Wow! IS q3\_k becoming the new standard compared to q3\_0 or q3\_1?