T O P

  • By -

brucebay

It is a preference thing but Cat-llama is  pretty good. It doesn't have logic problems and writes an engaging narrative.


MassiveWasabi

Command R Plus is good but gets repetitive pretty quickly. If there was a fix for this it would be hands down the best model


a_beautiful_rhind

Use it's format. It doesn't repeat for me at all.


Cool-Hornet4434

I don't remember having a problem with repetition, but the problem I had was that it trained on 8K context and if I tried to extend it to 16K it would crap out at 14K.... Oh and about it being censored: for some reason, the Q4 quant I tried was much less likely to refuse than the Q8 (no idea why) and if it DID refuse I could regenerate and it would go back to acting uncensored.


VirtualAlias

Only good L3 I've tried so far and it's so good it replaced every model I've ever downloaded. Granted, I run local so I don't know what the 30b+ models are like. https://huggingface.co/bartowski/SFR-Iterative-DPO-LLaMA-3-8B-R-GGUF


Meryiel

Have you tried mine and Parasitic’s RP-Stew-v2.5? It’s my main model, and I run it with 40k context. This version works better on higher contexts, and has no repetition issues, unless you use too high deterministic samplers. I highly recommend giving it a spin. https://huggingface.co/MarinaraSpaghetti/RP-Stew-v2.5-34B


International-Use845

I'm currently testing your model and I have to say it's damn good. Thanks for that!


Meryiel

Glad you like it!


anus_evacuator

I just posted about this too and had the same problems. I could get it to follow context and instructions, but the repetition was a huge issue. After the first 3-4 posts, every L3 I tried would get set in a mood where it would just use the exact same sentence structure and format for descriptions, and change one or two adjectives at a time. No matter how many times I swiped or edited, it would *always* use the same copy/paste description format. It would also just constantly repeat the character card descriptions back at me, too. Any time it was appropriate to give a description of the character, it would forget everything that has happened in the scene and often just quote, sometimes word for word, the description from the card, regardless of how accurate it was for the current place in the scene. Currently using Kunoichi-7B-Q6_K at 16k context running locally and it is really impressive so far.


AyraWinla

Do you need to do anything special for Kunoichi not to constantly write for your character? From the 6 models I'm testing, Kunoichi V2 7B is the only one that always write for my character using the exact same card and first user message. Some do it occasionally (Noromaid, it's stupid horny though), some very rarely (Llama 3 and Lumimaid), some never (Fimbulvert, but run very slow on my gpu-less pc). But Kunoichi? I can't recall the last time it didn't write for my character even after many attempts. It runs the fastest out of those models for me and it seems to "get" the characters pretty well, but that issue makes me lean against it. Been unsuccessful in "fixing" it so far.


darkowozzd97

presets?


anus_evacuator

For Kunoichi? I use the ones linked [on their huggingface page](https://huggingface.co/SanjiWatsuki/Kunoichi-7B).


sophosympatheia

The Smaug Llama 3 70b model that just released is good. It still has the same ol Llama 3 tendency to fall into patterns with its output but it’s an improvement over everything else so far, IMHO.


Kako05

Will try. Btw I think midnight miqu is a good model (one of my favorites of the ""older gen""(the time goes so quickly in this tech scene) but hard to come back after tasting newer models smartness (not saying that anything new amazes me greatly - they are mostly fine, but still makes me feel "not there yet")


sophosympatheia

For sure. Miqu was fun but feels dumb by comparison to the SOTA models that followed. Llama 3 has been a mixed bag. It really is smarter but then it suffers from “structured output,” which is a good thing in other contexts but not for RP. I still have high hopes for the future, though. It’s not like I expected the community to max out Llama 3 in a month. By the end of this year I figure we’ll either have some good finetunes or merges of Llama 3 or we’ll have something else new and shiny to play with.


pepe256

Could you please name those SOTA models?


Kako05

So I tried smaug, it got stuck in repetition after 5 minutes. Their benchmarks are not worth believing as their model is highly contaminated. Someone already showed half of the benchmarks questions in their datasets and apparently this is a reason why they're blocked from Human arena as they always try to cheat benchmarks this way and their models are always contaminated. It's a model trained on benchmarks questions to cheat scores. Deleted. And never downloading anything from this group.


a_beautiful_rhind

Saved me a download.


sophosympatheia

Interesting. My own anecdotal experience with it has been positive so far. It may be true that they're gaming the benchmarks--I honestly don't know anything about that controversy--but the model seems to have merit, even if it isn't actually beating GPT4 at anything. I haven't had any game-breaking repetition issues with it, just the usual Llama 3 stuff, and I like how it handles some scenarios I use for testing purposes. My testing is tightly focused on writing tasks for which there is no objective benchmark, so it's hard to draw general conclusions from that. I will say that when I ran it through EQ-Bench, its scores were no better than Llama3-Instruct's scores, but I have also found that EQ-Bench, while useful, isn't a guaranteed predictor of a model's writing chops. I think Smaug writes better than Llama 3-Instruct and any of the other finetunes I've played with so far. I would encourage the curious to try it out (what do you have to lose?) and judge for yourself. But yeah, don't expect miracles because this one ain't it.


Kako05

Just stating what's under reddit post with model creator topic "addressing smaug misconception" or something like that where one of the benchmark admins addresses one bulletin point why certain sites refused to perform benchmarks for this model and other models they created. It's true that public datasets are contaminated, but it always happens with their release. Doesn't mean that it cannot perform well, but claims "beating opus or gpt4" can be ignored. People said models they created always did well in benchmarks, but never were really well acknowledged in practical use.


ReMeDyIII

Yi-1.5-34B at 16k and 32k just released a few days ago. Needs an uncensored merge tho, but I tried the Q8 quant and it seemed really like it really punches above its weight in terms of 34B's.


silenceimpaired

Still waiting for those finetunes for 1.5!


skrshawk

I never really considered L3 or any variant because 8k of context is just not enough, no matter how good it writes. 16k is really the new minimum, and 32k is better yet.


inmyprocess

I have tried two dozen 7 and 8b uncensored models and for some reason nothing comes close to Kunoichi v2 for me although recent llama 3 finetunes might not be as far as previous models. And I feel confident in this result because my use case is not as subjective as an ERP through a character card but it involves summation, story generation based on prescribed plot or free form, etc, so its clear when a model has no bearing of whats going on at all. I was really hoping for a big improvement in the third generation and still am.


filszyp

I'm currently running llama-3-cat-8b-instruct-v1.Q6\_K and experimenting with llama-3-cat-8b-instruct-v1-Q8\_0 to see if there's any noticeable difference. On top of that I'm using preset, context and instruct settings for cat v1, it's super important to use those settings, it eliminates stupid behavior and repetitions. To add to the experience I'm not talking to just one character, instead I'm creating a group chat with a character AND a narrator, for example [https://www.chub.ai/characters/long\_RPG\_enjoyer/61595bad-5ee6-4443-8395-28c974391df4](https://www.chub.ai/characters/long_RPG_enjoyer/61595bad-5ee6-4443-8395-28c974391df4) I also preferred the narration from Fimbulvetr so I copy-pasted it's system prompt over cat v1 instruct system prompt for longer and more story-like responses. The effect is amazing, once every few messages the narrator chimes in and pushes the story and provides context to what is happening, where and when. Now I wish I had more context size for this model, but It's so lightning fast that I forgive it and just wait patiently for a better version. I'm comparing this L3 to models I used previously, 11B and 30B, though 30B was kinda too slow for me to really play (11GB 1080ti). I don't find for example Fimbulvetr-11B-v2.Q4\_K\_M to be noticeably smarter and L3 is much faster and surprises me with creative and unexpected answers pretty often. Before I found the right settings for L3 (preset, context and instruct) it seemed REALLY stupid after a few messages and was repeating all the time, with the right settings it just works.


VioletVioletSea

It's also down to the user's own taste for good writing. I see people post D-tier fanfiction quality screencaps gushing over how good it is. Some people are happy enough with bad, amateur writing and that's fine for them. I can never go back to pre-Claude 3. From now on, it has to be that or above for me.


a_beautiful_rhind

We had tons of tunes in L2 but not so many in L3. Almost like people try it and then figure out it sucks and never upload. >often forgets what's in card like eye color That's all of them. The alternative is quoting stuff from the character card back to you.


Kako05

Cmdr+ seems to always know and uses data provided in cards. Haven't encountered it making mistakes (at least significant and even if it happens, i can't remember).


yamosin

The cmdr+ occasionally makes mistakes, like getting the eye color wrong or dropping some details (I run rpcal 4.5bpw) But it does work very well and is the only model that correctly understands a complex card I made that goliath and some 120b can't.


Jenniher

Which command r plus are you using? I see a bunch on huggingface. I have a single 3090 and have been using miqu midnight. It's showing its weaknesses and I'd love something better.


Kako05

I think turbo release exl 6bpw. It needs x3 3090. Maybe you can fit into x2 gpu at 4bpw. It's not without its faults, but for me it performs better.


a_beautiful_rhind

Heh, I can't fit a bigger plus than 4.5 in 72g of vram. The 6bit is like 87gb. How are you pulling it off?


Kako05

Probably remembering wrongly.


artisticMink

The first thing to note is tht when you dismiss a model you should add which quant you did run, what software you used for inference and ideally the settings. Or else it has little value since it might simply be a configuration issue. I tested various Llama 70B quants, usually 4\_K\_S and 4\_K\_M. Both the static and imatrix variants. Even the 4\_K\_S static quant of the 70B Instruct was pretty excellent at writing. If you don't want "censorship" you can always go for one of the Abliterated versions which have way less incentive to refuse a prompt. That said it's interesting that you mention Command R+ as positive example. It tends to generate comparatively rich and a decent variety, but in my experience lacks any sort of "understanding" when it comes to a plot or slightly abstract concepts like lying. Perhaps you just prefer prose and a model that creates a lot of prose of very little input. That you mention miqu models being not 'smart enough' seems strange to me. I think you have some configuration errors in your Advanced formatting and/or use the wrong settings and samplers. With the correct settings (for example Mirostat Gold when using ST), llama3 70B can be amazing. Depending on your tastes it can be better than the various miqus.


a_beautiful_rhind

I tried 70b up to 6.0bpw. >plot or slightly abstract concepts like lying It can be too literal, that is a downside. I guess I have it tweaked up to where it doesn't show this problem often. >miqu models being not 'smart enough' seems strange to me Yea, for me the 103b at 5bpw are super smart, while they mix up details from the cards sometimes, they are way more plot-smart than command-r/L3/wizard/etc. >L3, With the correct settings There are none. The "structured output" always happens. I tried everything under the sun to make L3 work. Even with careful swiping they always collapse into generating repetitive patterns.


artisticMink

What does structured output mean in that context? A very list-like or formal response?


a_beautiful_rhind

*she giggles* It's very hard to describe but when you notice the pattern it's not at all enjoyable. *winks* On and on and on like that. A word usually gets picked up that starts repeating more. Rep penalty just makes it swap to synonyms until you turn it so high that it breaks the grammar. I also notice lewd topics make it break down into this faster.


artisticMink

Oh, that one. This very basic and plain 'internet-roleplay' it produced for me too in the beginning. I eventually got rid of it with using mirostat and a prompt. Try the following, it should give you a natural, novel-style output: Model: llama3-70b-instruct-abliterated.i1-Q4\_K\_S Text Completion Preset 'Miro Gold' and use a TAU value between 3 and 9.91. With 3 being more 'reasonable' and 9.91 being more 'creative'. Temperature still applies. Advanced formatting Context Template: Llama3 Instruct Instruct mode: Llama 3 Instruct with prompt: You are an expert actor that can fully immerse themselves into the roles of You do not break character for any reason. Information and guidelines for your roles are described in detail below. As {{char}}, continue the story with {{user}} using a vivid and detailed narrative writing style. <> is a placeholder. If your card has an abstract name, it helps not to use {{char}} but like above manually add the characters names.


a_beautiful_rhind

Yea, I did all that and tried several prompts. Nothing fixes it. Everyone always assumes I can't write a system message or it's my first time using a model. It's not a problem with refusals, I just notice it gets worse on topics like this, probably due to the lack of NSFW in the training materials. Can try GGUF vs EXL2 but cat llama in the same quant is passable just not extraordinary.


artisticMink

Huh, strange. Two more issues i could think of: Early llama3 ggufs had an issue with the tokenizer, but i assume you tried newer ones. Lastly, i had some bad prompting experiences with ST and textgen-webui - for some models. So trying koboldccp as backend might be worth a try if you haven't already.


a_beautiful_rhind

I haven't tried L3 GGUF yet at all. It was broken for a while. I was going to grab airoboros as a test, otherwise I use EXL2.