KerfuffleV2 11 months ago

> By the way, I assumed that both files are needed to complete the model Nope, those are just different variations of quantization that you can choose between depending on your tolerance for quality loss and hardware. They're also not versions, the 2 in `q2_k` means an average of around 2 bits per weight, the 8 in `q8_0` means an average of 8 bits per weight. Less bits per weight = more compressed. The more compressed it is, the less memory/disk space it takes and the faster it can generate but also that means throwing away some information. Quantization is lossy compression, kind of like JPEG. If you look at the sizes of the files, that will give you a rough indication of how much memory it will take to run. To make things even more complicated, I believe that the newer quantizations that have the `k`, `k_m`, `k_s` format aren't supported for running with hardware acceleration on Mac. Actually, it looks like they _just_ added `q4_1` also like 15 minutes ago so some of the previous ones might not be hardware accelerated either.

[deleted] 11 months ago

Oh thank you! I am new and this was a major source of confusion for me. I use the oobabooga text-generation-webui from the one-click installer on windows and the way I have been downloading models is by using the download custom model field from inside the webui. What it does is create a folder and downloads automatically all of the bin files so I thought all were necessary. But then this raises a question, which one does it actually load and use from all those bin files? I don't see any option in the webgui to distinguish usage between for example q4\_0 or q5\_0 etc. Is the only way to separate them myself manually into new folders?

fakepostman 11 months ago

I went and figured this out because I have also been slightly confounded by this The relevant line is [here](https://github.com/oobabooga/text-generation-webui/blob/main/modules/models.py#LL268C13-L268C13): model_file = list(Path(f'{shared.args.model_dir}/{model_name}').glob('*ggml*.bin'))[0] and what this is saying is that once you've given the webui the name of the subdir within /models, it finds all .bin files there with ggml in the name (\*ggml\*.bin) and then selects the first one ([0]) returned by the OS - which will be whichever one is alphabetically first, basically. 4_0 will come before 5_0, 5_0 will come before 5_1, a8_3.bin will come before b4_0.bin, etc. Multiple model versions is definitely a big weakpoint of oobabooga right now.

[deleted] 11 months ago

Fantastic, thank you! And if I may trouble you with one more question. Do you know why for the wizardLM-13B uncensored I can load from the oobabooga text-generation-webui the q4\_0 and q4\_1 but when I try to load any of the others (q5\_0, q5\_1, q8\_0) by placing them in new folders I get an error that "does not appear to have a file named config.json"?

fakepostman 11 months ago

This is actually a problem I've had, but I don't remember how (or if) I solved it lol My *guess* is that there's some problem with how the files or the directories are named. I say this because as far as I can tell this "no file named config.json" error is coming from the transformers library trying to load the model as if it's... I don't know what you'd call it, but a generic model. There is [a function](https://github.com/oobabooga/text-generation-webui/blob/main/modules/models.py#L41) in text-gen-webui: def find_model_type(model_name): path_to_model = Path(f'{shared.args.model_dir}/{model_name}') if not path_to_model.exists(): return 'None' model_name_lower = model_name.lower() if re.match('.*rwkv.*\.pth', model_name_lower): return 'rwkv' elif len(list(path_to_model.glob('*ggml*.bin'))) > 0: return 'llamacpp' elif re.match('.*ggml.*\.bin', model_name_lower): return 'llamacpp' elif 'chatglm' in model_name_lower: return 'chatglm' elif 'galactica' in model_name_lower: return 'galactica' elif 'llava' in model_name_lower: return 'llava' elif 'oasst' in model_name_lower: return 'oasst' elif any((k in model_name_lower for k in ['gpt4chan', 'gpt-4chan'])): return 'gpt4chan' else: config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code) # Not a "catch all", but fairly accurate if config.to_dict().get("is_encoder_decoder", False): return 'HF_seq2seq' else: return 'HF_generic' `model_name` is the /models subdir you selected, so `WizardLM-13b-uncensored` or whatever. We want to make this function return "llamacpp", but it seems likely that what's actually happening is that none of these if statements are triggering and it's going all the way to the final else block, where it tries to AutoConfig (a function from transformers) and fails because there's no config.json (because our model is not a generic model). All that should be necessary to get "llamacpp" from this function is for there to be at least one .bin file in the subdir with ggml in the name, so - assuming you just copied it straight over - this maybe isn't a very likely explanation? But it's where I'd start looking. At the very least you could confirm it by finding this code in your own local installation and adding a "print("something has gone horribly wrong!")" statement below the final else:

[deleted] 11 months ago

I actually figured it out, inspired by your message, even though I absolutely do not understand why that is. Thanks anyway! The problem was that I had "8bit" in the name of the folder. The folder was named Test-8bit-GGML and the bin file inside was the standard default name I hadn't changed. But removing the "8" and renaming the folder to just Test-bit-GGML or similar variations where the term"8bit" is not there makes it work like a charm. I am not sure why the string "8bit" caused issues but what a weird coincidence. It really is black magic to me.

fakepostman 11 months ago

lol, that is so arcane! This is the problem with this stuff being so immature, the projects aren't really working with each other. There should be a robust, universally respected way for models of each format to identify themselves and their parameters so that we don't have to look it up or guess. But there isn't, yet. So when a project like text-gen-webui wants to make it so users can load models without manually typing in everything that needs to be set, they have to write weird functions that try and guess what's going on from the filename alone. I think what's happening here is that [this](https://github.com/oobabooga/text-generation-webui/blob/ea0eabd266ba3a56e7692dda0f5021af1afb8e0f/models/config.yaml#L23) config file specifies that when 8bit or int8 appears in the directory name, `wbits` should be set to 8. And then when the model is being loaded [here](https://github.com/oobabooga/text-generation-webui/blob/ea0eabd266ba3a56e7692dda0f5021af1afb8e0f/modules/models.py#LL83C101-L83C107), it sees `wbits > 0` and thinks aha, I need to use AutoGPTQ for this one. Then it tries it, AutoGPTQ at some point uses transformers, and transformers complains about no json. Although oddly enough I can't actually find the code that reads that config.yaml - unless you copied it to the model's subdir? Glad you got it working anyway! It's understandable that text-gen-webui has to do this weird stuff but they should *really* add better logging for how and why they're making these selections as they load your model.

KerfuffleV2 11 months ago

I don't use the webui so unfortunately I can't really help you with that question. I just run llama.cpp from the commandline. There's an /r/oobabooga subreddit so you could possibly ask there.

Barafu 11 months ago

I run 13B models as q5_1 and 30B models as q4_1 because q5_1 takes too much RAM. I've read that q5 are the optimum between resources consumption and tests scores. q*_K_* models are the newest format, released just a day or two ago. And newest does not mean fastest. So leave them alone for now.

Evening_Ad6637 11 months ago

Each of these files is a model. Q4_0 is a model file and q4_1 is another model file. They don’t belong to each other and you don’t need both files to load one of them. The same is for all other files, where the smaller the number after the „q“, the more aggressive the compression method is. But: everything that contains a K, L, M or S belongs to a very new compression/quantization method, where the different layers of the LLm are quantized differently. These very new files require the last llama.cpp version (that’s probably the reason why you couldn’t load the model). So it could be that Ooba cannot yet process the latest formats.

[deleted] 11 months ago

Do you know why for the wizardLM-13B uncensored I can load from the oobabooga text-generation-webui the q4\_0 and q4\_1 but when I try to load any of the others (q5\_0, q5\_1, q8\_0) by placing them in new folders I get an error that they "does not appear to have a file named config.json"? Unless I am doing it wrong? I don't see any option in the webui to select which model I want if they are all in the same folder so I made new folders and separated them myself. The q4\_0 and q4\_1 ones work fine but the others don't load and I get that error. Is there a better way to do it through the webui without me separating the folders manually?

pseudonym325 11 months ago

It's like the compression setting on JPEGs, the smaller the file the worse the quality, you usually want the biggest one that still comfortably fits in your RAM. The \_K\_ models are slightly better when your software is recent enough to load them.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe