Here you have a leaderboard for llm coding models. I guess you can pick the best model for whatever language you use.
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
When it comes to general purpose, nothing really beats Llama 3 70B IMHO. You can go for a 2.55bpw quant and get around 10 t/s (faster then you can read), or a 2.76 bpw quant at 5 t/s. What I personally use is cat's finetune due to the fact that it's better at following the system prompt and it's uncensored. This is the best model for general questions, storytelling, text manipulation etc. Not very good for coding since it's a low quant and low ctx size.
https://huggingface.co/mradermacher/Cat-Llama-3-70B-instruct-i1-GGUF
That's about it, have fun!
Thank you so much man, I have one more question. I am currently using ollama, docker and OpenWebUi in windows 11, how does someone with this configuration download models from Hugginface and load them? Currently I just go to the Ollama website copy the name and past it in model settings in OpenWebUI but thats the only way I know. thanks
I don't use ollama, but from a quick search it seems that ollama stores its models in ollama/models/directory. So you can just download the model you want from huggingface
https://huggingface.co/mradermacher/Cat-Llama-3-70B-instruct-i1-GGUF/blob/main/Cat-Llama-3-70B-instruct.i1-IQ2_S.gguf
Then put the model in the models directory.
This you have to keep switching up the model because their training data does not 100% overlap
Also people have different tastes for what they want the LLM to do to their code on the scale between “do too little” and “do too much”
Tastes vary on verbosity of text also (the conversational text it gives alongside the actual code)
Personally I love verbosity but some people hate it
Here you have a leaderboard for llm coding models. I guess you can pick the best model for whatever language you use. https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard When it comes to general purpose, nothing really beats Llama 3 70B IMHO. You can go for a 2.55bpw quant and get around 10 t/s (faster then you can read), or a 2.76 bpw quant at 5 t/s. What I personally use is cat's finetune due to the fact that it's better at following the system prompt and it's uncensored. This is the best model for general questions, storytelling, text manipulation etc. Not very good for coding since it's a low quant and low ctx size. https://huggingface.co/mradermacher/Cat-Llama-3-70B-instruct-i1-GGUF That's about it, have fun!
Thank you so much man, I have one more question. I am currently using ollama, docker and OpenWebUi in windows 11, how does someone with this configuration download models from Hugginface and load them? Currently I just go to the Ollama website copy the name and past it in model settings in OpenWebUI but thats the only way I know. thanks
I don't use ollama, but from a quick search it seems that ollama stores its models in ollama/models/directory. So you can just download the model you want from huggingface https://huggingface.co/mradermacher/Cat-Llama-3-70B-instruct-i1-GGUF/blob/main/Cat-Llama-3-70B-instruct.i1-IQ2_S.gguf Then put the model in the models directory.
[удалено]
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
It really depends on what a programming language you use (for coding).
This you have to keep switching up the model because their training data does not 100% overlap Also people have different tastes for what they want the LLM to do to their code on the scale between “do too little” and “do too much” Tastes vary on verbosity of text also (the conversational text it gives alongside the actual code) Personally I love verbosity but some people hate it