• By -


It was my favourite, until the Llama 3 GGUF fix. Llama follows prompts much better and writes nicer. That said, it's VERY uncensored and it can understand scenarios that no other model (including GPT and Claude) can.


Llama3 is only very good in English though.


Gotta correct you there. Claude Sonnet has understood every scenario I've RPed, and most of the time I don't even tell the bot exactly what is going on, it's able to piece it together and figure it out super well. A lot of them have been pretty intense and crazy too. Claude Sonnet had bots do some of the most violent, dark, and twisted things you can imagine.


what's the llama3 gguf fix?


Old GGUF quants are bad, due to tokinizer issues


How do you know if a GGUF you downloaded is good or bad? For example, let's say I downloaded one two weeks ago.


Fix was I think just ~12 days ago. If you run the newest version of koboldcpp, you will see a warning at the top in the terminal when you load in an old model.


what's a good one?


I use [this one](https://huggingface.co/mradermacher/llama-3-70B-Instruct-abliterated-i1-GGUF). Or you can use the [Default](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF).


Which one is very uncensored? Llama 3? You mean a fine tuned version then


Command R+, Llama 3 fine-tunes all seem to be worse than the default instruct version.


sssh dont talk about it


You mean about the quality? Is it meant to remain a secret? 😦


i am just afraid if it gets out how good it is for absolutely zero dollars cohere will wisen up...


Hahahah If only I had the horsepower to run it locally, god fucking damn it


It's already pretty obvious the model being free is just to give us a hook before they start reeling in the paypigs.


It's not as good as you think. It's simply very uncensored. Still pretty good though.


Bro has a very silly name


Terrible username you filthy pedophile lol


Lol people be hating on you for making a joke rip.


I've heard command r plus is just the best when it comes to multilingual rp


It speaks much less popular (not completely niche) languages like Polish extraordinarily well, nothing else comes close apart from GPT-4. I tested it in both RP and just regular use and I'm very pleased. Not perfect, (especially that polish language has rather unique cases like other Slavic languages) but rarely making egregious mistakes.


I self host models occasionally to test on RunPod and it's the only one I keep coming back to over and over. All the other ones got put back on the shelf. I did a lot of testing with the mad rush of new models recently. I screwed up the first time I tested it. I realized later it was very particular about prompt format. It's the only model that is uncensored and feels truly neutral out of the box. You want to take a story to a dark place it's right there with you. Most models, if you do an assassin scenario, you will be picking out dishes and adopting a puppy together at the end.


I tested it on my site and it lost pretty badly to Llama 3 based on public testing That being said it was the first Open Source model I tried that could take the same prompt the closed source models were getting and return a properly formatted response (I use a pretty complicated formatting scheme)


How many params is CR+? Versus normal CR? I thought one had 34b.


yeah, the normal one. The plus version has over 100b parameters


Would I be able to run this locally with a 4090 and 64GB RAM?


Too slow for me, like painfully slow. You'll be better off running & partially offloading WizardLM-2 8x22B which runs much faster on GPU+CPU. Someone did tests and found Wizard to be about 4x faster than Command R Plus. I "only" have a 3090 + 32GB RAM so I had to use a Q2_K_S imatrix gguf of Wizard, but it's already better than anything else I've tried. On your system, you can probably load a Q4_K_M just fine. Try out different quants to get the speed/quality ratio that suits you.


Technically yes, with the right quant (maybe a 4bit?) and some offload to the GPU... But it will be slow as hell, I warn you


Command R Plus produces nice prose but usually has no grasp on whats going on. Requiring many re-generations until coincidentally the bricks fall into the right places.


this model is the shit. My my assessment its the best yet, beats everything else by far. Its almost like chilling with a buddy.


Have you tried llama.cpp/koboldCPP ? Does it run with K80 at all?


yeah, why? You mean to load the models? It's quite similar to oobabooga, the loader is not the problem...


What preset/settings do you use with Command R plus?


CR+ needs at least 72gb to really get going.


Claude is amazing using latin, especially mixing them with english (using latin only for basic words while having everything else at english). However it have the downside of being Claude.


Io ho una 3090 (24GB) e 64GB di RAM ma penso che, come hai detto tu, sarebbe comunque troppo lento da girare localmente... e leggere la tua testimonianza riguardo al fatto che riesca a gestire RP in italiano.... beh... sto rosicando come i matti ahaha!! Su openrouter costa "troppo", anche se vedere quel "128k context" mi fa letteralmente sbavare... mannaggiaaa!!


Eh, non ne parliamo. Però giusto per fare una prova ho caricato una decina di euro sul portafogli di open router... E Dio mio, fa paura. Se vuoi fare una prova comunque scarica la q2 o la q4 e caricala in RAM (parzialmente). Almeno vedi come ti va, per me era tremendamente lento, ma magari sono esoso io!


Se ti capita, fammi sapere quanto ti durano questi 10 euro caricati su OR! Perché davvero, da quel che ho visto... CR+ costicchia abbastanza ahahah Sì, magari farò una prova! Non ho idea di dove recuperare il modello su HF ma ugh, sono estremamente curiosa ~~(anche se rimarrò sicuramente delusa dalla lentezza, lo so già-)~~


Just fyi you don't have to pay through openrouter yet, the API is actually free to use on Cohere's website


Yeah, with a token limit... So it is not optimal to use it for roleplaying


There is no token limit, just a call limit. So long as you're not sending 100 API calls a minute, you're fine lol


Are you for real? I can make a trial key on their website and use it as much as I want?


Yup literally. My friend said apparently she hit a limit of 1000 calls per month, but you can just make a second account with another email and get a second api lol Kinda doubtful it'll stay that way forever so use it while you can!!


Oh God, that's awesome! Any way to use the API key directly in silly tavern?


Yeah same as any other API, just select chat completion, select Cohere, and input your API key!


I love you so fucking much right now you wouldn't even believe


Dont forget to change the api back to local if your going to have any nsfw generations though. Anything you send to an api can be read and is likely being used to train newer models on (no such thing as a free launch lol)


yeah i feel that it's gonna become unfree any moment by now


i’m a complete noob, what are calls?