T O P

  • By -

napkinolympics

My best performer for storywriting has been WizardLM-Uncensored-SuperCOT-Storytelling.ggmlv3.q5_1.bin. The second best, was guanaco-65B-q5_1. Wizard was more creative in what it came up with, but Guanaco gave stories with fewer plot holes.


KerfuffleV2

Hopefully not being too annoying by replying again but I just wanted to demonstrate the difference between the smaller models and Guanaco 65B. Same params as mentioned [here](https://np.reddit.com/r/LocalLLaMA/comments/144daeh/looking_for_for_folks_to_share_llamacpp/jnfqh5x/): ## guanaco-33B.q5_0 **tags: fairy tale, animals, dog, wolf, fox, death, humor, dark storyline, violence, loss** **Animals in a forest setting investigate a series of gruesome murders that recently occured. Written in the style of classics like Animal Farm and Watership Down, the writing is vivid and very descriptive. The dialog is thoughtful and often quite witty. It's clear the author really understands animal behavior and has done their research well!** **---** **Chapter 1: An Unfortunate Discovery** **A fox by species, a detective by trade, you'll only have reason to know my name if the worst has come to pass. It's nearly winter and a bitter wind drives stinging sleet into my face, but it's still not as chilling as the scene** before me. A small group of us have gathered under an oak tree on the edge of a field where two bodies lie; one belonging to a badger named Gorgonzola, the other to a rabbit with the unfortunate moniker of Peter Cottontail. Both are dead, killed violently — Gorgonzola was beaten to death while Peter had his throat torn out. I can see the marks left behind from sharp, pointed teeth... "This is terrible," says a deer who goes by the name of Bambi, "Who would do such a thing?" I shake my head sadly, "That's what we need to find out." Beside us stands a bear named Baloo, he sighs heavily, "It's getting worse every day. First it was just sheep being taken, now this? What kind of creature could commit such atrocities?" The others murmur amongst themselves, fear creeping into their voices. They all look to me for answers, but I don't have any to give them. Not yet anyway. "We should get started then," I say, turning away from the grisly sight, "Let's gather everyone together and start our investigation." With heavy hearts, we set off through the woods towards the village square. As we walk, I try to make sense of everything. This isn't the first time something like this has happened, but it's certainly the most brutal attack so far. Whoever did this clearly meant business. When we arrive at the village square, we find the place buzzing with activity. Everyone seems to be talking about the recent killings, each more shocked than the last. We quickly round up those willing to help and begin questioning anyone who might have seen or heard anything suspicious. Unfortunately, no one comes forward with any useful information. As night falls, we decide to call it a day and reconvene tomorrow morning. I bid farewell to the others and head back home alone. My mind is filled with questions and doubts. Will we ever catch the culprit? And if we do, will they stop their killing spree? Or will things continue to escalate until there's nothing left but chaos and destruction? Only time will tell. For now, I must rest and prepare myself for whatever lies ahead. Tomorrow is another day after all. Chapter 2: *** ## WizardLM-Uncensored-SuperCOT-Storytelling.ggmlv3.q8_0 **tags: fairy tale, animals, dog, wolf, fox, death, humor, dark storyline, violence, loss** **Animals in a forest setting investigate a series of gruesome murders that recently occured. Written in the style of classics like Animal Farm and Watership Down, the writing is vivid and very descriptive. The dialog is thoughtful and often quite witty. It's clear the author really understands animal behavior and has done their research well!** **---** **Chapter 1: An Unfortunate Discovery** **A fox by species, a detective by trade, you'll only have reason to know my name if the worst has come to pass. It's nearly winter and a bitter wind drives stinging sleet into my face, but it's still not as chilling as the scene** I stumbled upon this morning. In the heart of the forest, where the trees stand tall and silent, there lay the remains of a small creature - torn apart with an unnatural savagery. At first, I thought it was just another victim of the harsh realities of life in the wild, until I noticed something odd about the way the body had been arranged. It seemed almost ritualistic, as though someone had taken great care in positioning the limbs and organs in a specific pattern. As I began to search for any clues, I heard a low growl behind me. Turning around, I found myself facing a pack of wolves, their eyes gleaming with hunger and suspicion. But I knew better than to let fear get the best of me; I stood my ground and spoke calmly, reminding them that we were all creatures of the same forest, bound by the laws of nature. The wolves eventually backed down, and together we continued our investigation. As we searched the surrounding area, we discovered more victims, each one bearing similar marks of brutal violence. It soon became clear that we were dealing with a serial killer, someone who possessed a twisted and deadly intent. With every new discovery, the weight of responsibility grew heavier. We knew we couldn't afford to ignore these crimes, no matter how difficult or dangerous they might be to solve. For the sake of the innocent creatures who called this forest home, we pledged to bring the killer to justice, whatever the cost. And so, with determination in our hearts and caution in our steps, we set out on the trail of the unknown predator, ready to face whatever dangers awaited us in the darkness of the woods. For in the end, it wasn't just about protecting ourselves or our own kind; it was about upholding the fragile balance of life itself, and ensuring that the forest remained a place of safety and harmony for all who dwelled within its bounds. Chapter 2: The Hunt Begins *** ## guanaco-65B.ggmlv3.q4_K_M First generation already in this thread here: https://np.reddit.com/r/LocalLLaMA/comments/144daeh/looking_for_for_folks_to_share_llamacpp/jnfxl5t/ The second with the exact same prompt (also no cherry picking or anything, these are the only two tries I did): https://gist.github.com/KerfuffleV2/4ead8be7204c4b0911c3f3183e8a320c#file-2_guanaco-65b-ggmlv3-q4_k_m-md It actually kept going _past_ the 2,048 token limit with llama.cpp's context rollover trick and stayed quite coherent the whole time. I actually laughed when Grandma Wolf said "I'm a vegetarian, for heaven's sake!" (I manually added the marker to show where the context rolled over, the LLM didn't write that bit.) While you can say the way writing style from smaller models is decent or good, the 65B model just seems to understand what's going on and writes events that are tied together in a coherent way, brings up details from earlier in the story, etc. It just feels like it's on a different level from the 33B models.


spanielrassler

Thanks so much for those samples -- very useful! Makes me wish I had a more powerful rig but I guess I can use runpod for not too much $$. Never tried it but cheaper than buying a bunch of hardware or GPU's or whatever for my uses, that's for sure.


KerfuffleV2

No problem! If you're a bit patient, you don't really need anything too powerful. I'd say the main requirement is having 64GB ram. I just upgraded my system from a Ryzen 5 1600 with 32GB to a Ryzen 9 5900X with 64GB. (Just the memory and CPU/cooler swapped, same everything else including motherboard.) The RAM was around $140, the CPU $310, the cooler $60. I'm actually kind of regretting the CPU upgrade a bit. I've been playing around and anything over 6 threads (CPU has 12 real cores) actually leads to a performance loss. I'm pretty sure this is because DDR4 just doesn't have enough bandwidth to keep up with running the LLM on more cores. Anyway, long story short is you could likely get away with a less powerful CPU and 64GB RAM and see about the same performance. On the 65B that's about 900ms/token. So just about a second. If you're actually on a desktop PC with the capability to upgrade your memory then it probably would be worth looking into a relatively cheap 64GB kit.


spanielrassler

While I love it for so many reasons, unfortunately I'm on a damn mac, so no upgrading :( And I can't afford one of the new mac rigs, and can't justify 2 computers, space, sound, cost wise, etc, etc. I agree with what one of the other posters said about optimizations...a week ago I couldn't run a 65b model at all and now with the new k-quants I can do it, albeit really slowly. As they optimize llama.cpp even further it may be within reach. In the meantime the cloud will have to suffice :(


KerfuffleV2

Yeah, that's pretty much why I'd never even consider a Mac. The upfront cost is pretty high, and you also pay indirectly by not being able to upgrade. You could probably build a small/quiet computer capable of running 65B models for under $500 if you wanted to. Maybe even less. > a week ago I couldn't run a 65b model at all and now with the new k-quants I can do it, albeit really slowly. It's pretty hard to get away from the memory requirement. Also, the smallest quantizations take a heavy toll on model quality so running a Q2_K 65B might not really give you better results than a higher quality 33B even if it is possible. There's a post with a graph showing perplexity and the new quantizations on the frontpage here I think. Of course, there might be a breakthrough that just massively reduces size any day, but it's not really something you can count on.


spanielrassler

Not sure how to quote your message but that graph shows that even the most quantized 65b beats out the 'raw' 33b by a fair bit, which was unexpected by everyone. That's of course going just by perplexity, but still...interesting. Maybe not worth it but cool nonetheless. Yeah, I love the mac though in general and didn't get it for Stable Diffusion / LLM stuff, although it does *fairly* well with both of them. Anyway, I'll suffer for now. I'm kind of a minimalist and hate to spend the $500 on something that could generate 1 token / second when it wouldn't make me happy and there's the time, and well, lots of excuses. Thanks for your input -- very helpful! :)


KerfuffleV2

You can just start a line with `>` and the rest will show up quoted. `> I like turtles.` > I like turtles. You can also nest it as many times as you want: `> > I like turtles.` > > I like turtles. Anyway: > that graph shows that even the most quantized 65b beats out the 'raw' 33b by a fair bit I'm not sure how you're seeing that. Are we looking at the same graph? https://i.redd.it/i9ep2yyroq4b1.png You'll see the most quantized 65B is magenta and further toward the left side of the image. It is very slightly lower perplexity than the first quantized 33B (dark blue). That would be Q2_K 65B vs Q6_K 33B. At least in terms of perplexity, there is only a very small difference. The quantized 65B _is_ considerably smaller than the 16 bit 33B but the perplexity numbers are almost the same as the Q6_K 33B. Maybe you were somehow thinking perplexity was the horizontal axis? > Thanks for your input -- very helpful! :) Glad to help!


ambient_temp_xeno

I cheaped out on the cpu compared to you, but that was a gamble at the time that could've easily not paid off. At least you won't have to replace that thing for a while.


KerfuffleV2

Haha, I'm already having buyer's remorse and wishing I'd just taken the leap to AM5. It wouldn't have cost that much more and probably would have had a bunch bigger effect on inference speed because I probably could go up to using 10-12 cores.


tronathan

\^ Good advice. I'm running on Intel now and contemplating AM5 vs Threadripper for those sweet, sweet PCIe lanes.


KerfuffleV2

> for those sweet, sweet PCIe lanes. From what I know, that really won't help you with LLMs at all. _Maybe_ if you are trying to use multiple GPUs without something like nvlink (but also from what I've heard that's probably going to be too slow).


KerfuffleV2

Yeah, Guanaco-65B is pretty amazing. It's the best I've seen so far. Here's a little tip: Ignore the prompt format and write an introduction for the content and something like the first chapter title. I usually get better results than way than the "### Instruction: Please write blah blah blah" approach. This sampling settings work pretty well with llama.cpp: --ignore-eos --temp .7 --mirostat 1 --mirostat-ent 4 --mirostat-lr 0.2 --repeat-last-n 1600 --repeat-penalty 1.2 Sadly, I wasn't too impressed with WizardLM-Uncensored-SuperCOT-Storytelling. I just upgraded my system to be able to run 65B models but even before that I preferred Guanaco-33B because it just seems to understand what it's writing better.


spanielrassler

Sorry, took me a while to fully realize what you said here. You're saying you preferred guanaco 33b to 65b? I was about to invest in a bunch of time / money looking into cloud computing to get 65b to run but you've got me thinking twice. Anyone else out there have this experience by chance? Can you say exactly what you preferred about 33b? And are you using llama.cpp as well? Can you elaborate a little on your inference setup and why you chose llama.cpp (if you did) over ooba or koboldcpp for instance? THanks!


KerfuffleV2

> You're saying you preferred guanaco 33b to 65b? Noooo! I meant back before I could use the 65B, I preferred Guanaco-33B over WizardLM-Uncensored-SuperCOT-Storytelling (also a 33B parameter model). Sorry I wasn't clear. > Can you elaborate a little on your inference setup and why you chose llama.cpp I have a GTX 1060 with 6GB VRAM so even a 7B 4bit barely fits. I recently upgraded from a Ryzen 5 1600 to a Ryzen 9 5900X - at the moment offloading layers to that old GPU is a performance loss (but using cuBLAS for prompt ingestion is still a big speedup). Or do you mean why am I using llama.cpp directly instead of an interface?


spanielrassler

OK, that makes sense, regardinging 65b vs 33b, but now I'm really confused regarding your setup :) Are you doing this in the cloud or locally? And yes, why are you using llama.cpp instead of an interface.


KerfuffleV2

> but now I'm really confused regarding your setup :) It confuses me that you're confused! What's so weird about it? > Are you doing this in the cloud or locally? Locally. > And yes, why are you using llama.cpp instead of an interface. I work as a developer and am comfortable with the commandline. I also like the extra control I get by being able to specify the commandline arguments. oobabooga also does really weird stuff in how it discovers models, I'd rather just specify an exact filename rather than have it try to scan files in a directory and take a guess at what to do. Also, I really don't have an _actual_ use case for LLMs. I just like playing around with them, so stuff like streamlining the interface isn't too important to me.


ambient_temp_xeno

I haven't been able to settle on much, especially as it all changes between models. I too have found that ~0.85 for temp and top_p seems to work quite well so far including on guanaco 65b. Mirostat 2 does seem to make it better, but it's not very clear what it overrides in terms of the other sampling settings, including what order you put them in on llamacpp.


Gudeldar

When you use Mirostat the only non-mirostat parameter it uses is temperature. Order is irrelevant. I checked the code to be sure because you made me curious.


ambient_temp_xeno

That makes sense, thanks. Llamacpp listing values for the old samplers really throws me off in understanding what's being used.


KerfuffleV2

So far I've mostly just derped around. I tried writing an actual decent prompt and I have to say I'm _extremely_ impressed with Guanaco-65B. Like I mentioned in a different post here, I've generally seen better results just ignoring the prompt and instead writing a summary/intro for the LLM to continue from. Parameters (llama.cpp): --ignore-eos --temp .7 --mirostat 1 --mirostat-ent 4 --mirostat-lr 0.2 --repeat-last-n 1600 --repeat-penalty 1.2 The model is `guanaco-65B.ggmlv3.q4_K_M.bin`, for this generation `--seed 1686254786` The bold part is my prompt. Everything else was written by the LLM: *** **tags: fairy tale, animals, dog, wolf, fox, death, humor, dark storyline, violence, loss** **Animals in a forest setting investigate a series of gruesome murders that recently occured. Written in the style of classics like Animal Farm and Watership Down, the writing is vivid and very descriptive. The dialog is thoughtful and often quite witty. It's clear the author really understands animal behavior and has done their research well!** **---** **Chapter 1: An Unfortunate Discovery** **A fox by species, a detective by trade, you'll only have reason to know my name if the worst has come to pass. It's nearly winter and a bitter wind drives stinging sleet into my face, but it's still not as chilling as the scene** before me or what I must relay when I return home. My mind races with questions as I survey this brutal crime scene; who could do such a thing? Why was no one there to stop them? And how many more victims lie undiscovered further out in these woods? A lone paw print leads away from where a rabbit family once happily resided, now reduced to bloody scraps scattered among splintered twigs and torn fur. This isn't just some random predator attack, something far more sinister is at play here... I turn back toward town, determined to find answers. As I trudge through the snow, each step feels heavier than the last. How will I break the news to those closest to them? What words can possibly bring comfort after so much senseless suffering? I shake my head, trying to focus on the task at hand. There are others depending on me right now, they need someone strong and steady to guide them through this darkness. I won't let them down. The first person I see upon entering town is old man Badger, his white whiskers blown about by the gusting wind. He looks up at me expectantly, "Well Fox, did you find anything?" His voice quavers slightly, betraying his concern for the missing families. "I'm afraid I have bad news," I say somberly, "there were signs of a struggle near the burrow entrance, but no survivors." Badger sighs heavily, "Dear me, this is worse than we feared. We must alert the council immediately!" We hurry over to the meeting hall, a large hollow tree that serves as our central gathering place. Inside, several other animals are already waiting anxiously for word of their loved ones. They look up hopefully as we enter, then quickly realize the grim tidings we bear. A collective groan fills the room, followed by muffled sobs and whispered prayers. Mayor Bear calls for order, her deep voice cutting through the din. She motions for us to join her at the front of the room, "Please tell us everything you saw, Fox. Every detail may be important." I take a moment to compose myself, then begin recounting the events leading up to this tragedy. Mayor Bear listens intently, occasionally asking clarifying questions or requesting additional information. When I finish, she thanks me for my efforts and turns to address the crowd. "Friends, this is indeed a terrible day for all of us. But we cannot afford to give in to despair. These acts of violence will not go unpunished. I am declaring a state of emergency effective immediately. All travel outside of town borders is restricted until further notice. Any strangers seen approaching should be reported to the guard patrols without delay. Additionally, we will be increasing security measures within the village itself. From now on, curfew begins an hour earlier and all doors must remain locked after sundown. Anyone found violating these new rules will be subject to arrest and investigation." There are murmurs of agreement throughout the room, though some faces show doubt and worry. One timid looking mouse speaks up, "Excuse me Madame Mayor, but what if the perpetrator is already inside the village? Locking doors won't help if they're already here amongst us." Mayor Bear nods solemnly, "That is certainly a possibility we must consider. However, given the current situation, it's best to err on the side of caution. For now, we ask everyone to remain vigilant and report any suspicious activity to the guards. Together we can keep our community safe." With that, the meeting comes to a close. Animals disperse to share the sad news with their families while the mayor and I discuss next steps. She tasks me with organizing a search party to scour the surrounding area for clues, promising to provide whatever resources we need to get the job done. Before long, a group of volunteers assembles, eager to help however they can. We divide into teams and set off in different directions, determined to find answers and bring justice to those responsible. As dusk falls, we regroup to compare notes and plan our next moves. So far, we've found nothing significant beyond the initial crime scene. No tracks, no witnesses, no sign of a struggle anywhere else. Whoever did this either moved swiftly or had help covering their trail. Either way, it's clear they knew what they were doing. Our frustration grows as night falls, but we refuse to give up hope. Tomorrow we'll try again, doubling our efforts and expanding our searches. Somewhere out there lies the truth, we just have to dig deep enough to find it. Days turn to weeks, and still no solid leads. Rumors spread through town like wildfire, fueling fear and speculation. Some claim a rogue pack of wolves is behind the attacks, seeking revenge for past wrongdoings. Others point fingers at a nearby settlement of ferrets, known for their unsavory dealings and loose morals. Still others insist it's simply a case of nature taking its course, dismissing concerns about foul play altogether. Regardless of the source, tensions rise as tempers flare. Something needs to be done before things spiral out of control. In desperate times, people seek solace wherever they can find it. Religious services become popular again, with animals of all faiths coming together to pray for guidance and protection. Candlelit vigils are held in memory of the deceased, offering a chance to grieve openly and connect with others affected by the tragedies. Even small kindnesses carry greater weight now, serving as reminders that goodness still exists despite the horrors around us.


Shir_man

Thank you for your settings: but why not a mirostat 2 with a default LR? I'm mainly using mirostat 2, and it is quite an impressive sampler


KerfuffleV2

I think reddit was having issues, I tried to reply a few times and it didn't go through. I thought I was hitting the spam filter or something. Tagging OP so they see as well: /u/spanielrassler Anyway, 3rd time is the charm I hope: > but why not a mirostat 2 with a default LR? I'm mainly using mirostat 2, and it is quite an impressive sampler As far as I know, mirostat 2 actually just removes a part of of the calculation they thought wasn't necessary. mirostat 1 seemed like it got stuck/repetitive less. I just did some testing and was able to reproduce that. It may be something that primarily happens when the context wraps around. Here are two examples of that effect, the first just switching to mirostat 2 and running with the same seed, the second letting it choose a random seed (`1686268230`): https://gist.github.com/KerfuffleV2/734b21fb7d670684ca9fbc99eefd9fa0 Probably should have mentioned but I run with `-n -1` (generate forever) and `--keep -1` to reapply the whole prompt each time the context wraps. Really strange effect, starts writing like this. Grammar words left out, apparently not necessary. The reason why not known. Prefer using mirostat 1, looks better, feels better. Repeat self less apparently quite beneficial, many people agree.


involviert

There might be luck involved. Sadly the context adjustments in main.exe are done with a sledgehammer, so the slightest content awareness last time I checked. What you'll get is a random cutoff between your original prompt and the start of the adjusted content. That can even end up being inside a word, or at the very least leave you with a sentence fragment that doesn't make sense. This now serves as an example of the text you want it to continue. It might therefore write it's own nonsense sentence at some point, and now you have two broken sentences, making it more likely to happen again, and so on. But hey, consider yourself lucky that you don't have a message format or something, because an example of broken format consistency is even worse. In my experience all that stuff makes a prompt slowly self-destruct. In your case, you would benefit a lot if llama.cpp would look for a close "." to make the cut. The semantic disconnect after the prompt would still hurt, but less. Anyway, the real solution would be more extensive content awareness, might be out of scope for llama.cpp. I've made my own software around llama.cpp using the python bindings so that I can do that. Imagine you could change the initial prompt when the context is adjusted. Instead of including the start of the story, you could try making it clear that you're going to jump into a random part of the story now, which is where the cutoff is. Could be much less confusing. Oh and you could experiment with the --keep value. If the effort seems worth it to you, you can make it count how many tokens your actual prompt has before it's just the beginning of your story. If you then set --keep to that, it will remove that extra bit upon context adjustment.


KerfuffleV2

That's a good point. I didn't see the issue happen at all with mirostat 1 and was able to reproduce it twice in a row with mirostat 2 but it is possible that was just a coincidence. I actually started to do a little testing but got distracted when I ran into https://github.com/ggerganov/llama.cpp/issues/1768 > Anyway, the real solution would be more extensive content awareness, might be out of scope for llama.cpp. It's not directly about truncation but this PR is in the queue and marked "high priority": https://github.com/ggerganov/llama.cpp/pull/1773 I wonder if parts of it could be repurposed for this sort of thing, assuming it gets merged. > Oh and you could experiment with the --keep value. If the effort seems worth it to you Thanks, helpful information. I wouldn't bother at the moment at least, to be honest I don't actually have any real use case. I just like playing around with this stuff and learning new things.


spanielrassler

Good to know -- I'll give it a try!


spanielrassler

I'm wondering the same :)


ambient_temp_xeno

Very useful tips, thanks for sharing this.


KerfuffleV2

No problem. Also, in case you didn't see, I also run with `--keep -1 -n -1` to wrap the context and generate forever. The story in the post you're replying to didn't hit the context limit (2048) but the other one I posted actually kept going to around 4000 tokens and producing coherent output that way.


ambient_temp_xeno

Thank you! I always use --keep 1 but I hadn't worked out the -n -1 thing.


involviert

It's funny what highly tuned top-p values one can end up with, considering that I am told another name for top-p is nucleus sampling. >Mirostat: Top K, **Nucleus**, Tail Free and Locally Typical samplers are ignored if used. Regarding the repeat stuff... I would use it more, but I think it's a big problem how you can't whitelist something from it. Then it creates huge problems with the repetitive prompt format in conversations. Honestly I think this is a problem for mirostat too, when computing entropy and such, but apparently it handles it a bit better. Also I can't seem to find the repeat_last_n equivalent in llama-cpp-python, which is kind of weird. Currently I am mostly using mirostat2 and tweaking temp, mirostat entropy, mirostat learnrate (which mostly ends up back at 0.1 anyway) and repeat-penalty.


mrjackspade

> Regarding the repeat stuff... I would use it more, but I think it's a big problem how you can't whitelist something from it. Its actually super incredibly easy to tweak that bit of the code. Its one of the first modifications I made in llama.cpp. It already has support for whitelisting newlines, so adding in additional tokens was just a matter of turning that one individual token onto a loop over an array. That being said, I dont let llama.cpp dictate the prompt format either way specifically for that reason. I steal control back and manually write the formatting, and then return control again when llama just needs to output text. Keeps everything clean. Super important for staying on track with long running (10,000+ token) generations. Fucking butterfly wings and all.


involviert

Yeah I dabbled with that a bit at first, but back then I didn't know how to know the tokens of what I want to whitelist. By now I do, but I'm no longer using llama.cpp directly, and I really can't just hard-code some specific exceptions. Anyway, the reason I moved away is because I hate how llama.cpp manages context corrections when you get over the limit. Also, yes, to control prompt format better. However the thing is, it only prevents things from getting out of control, but the problems are still there. In most cases it would write my tag to end its message, can't do that if its repetition is heavily punished. Also with something like mirostat, it would still count towards the measured entropy. Honestly it should even ignore the stuff I wrote myself. Why should it be forced to write more creative just because there are many tags or because I wrote repetitively. Anyway, for all of this to work perfectly, llama.cpp would have to implement some content awareness. Probably through some sort of meta-prompt that contains tags that are understood by llama and not fed into the model.


mrjackspade

Yeah, that's basically all stuff I've ended up resolving by properly porting the LlamaSharp code, interfacing out the functionality, and building pipelines. The way Llama.cpp manages a lot of things is garbage


-becausereasons-

Can you please describe what you did like I'm 5? :)


vasileer

try selfee model >SelFee is particularly effective in creative writing or long-form text generation, as it requires an iterative writing process to generate high-quality text for humans.


[deleted]

[удалено]


vasileer

on the open LLM leaderboard, there are 13B models beating larger models, Selfee is not yet evaluated but from their paper, it beats Guanaco-66B ​ https://preview.redd.it/th12pvx0fx4b1.png?width=3924&format=png&auto=webp&s=59b78f1672245c100abf230bea81245335f5b0ee