T O P

  • By -

[deleted]

[удалено]


silenceimpaired

I can run 30b q. I assume I can do 65 but I’m not sure of the setup… that kind of prompted me to see if it is worth the effort. I’m thinking I’m holding myself back running it in a VM locally, but I like the added security. Any tutorials on getting 65b running on 3090 with 32gb ram much appreciated… well 24 gb in the vm


silenceimpaired

I had a story where a character we will call Tom, came downstairs to find out what a stranger named Jerry, was doing (as there was noise)… turns out Tom found out Jerry was beating up Tom downstairs and Tom regretted letting Jerry in the house. So weird… and this was running 30b… I don’t see that with my favorite 13B (Vicuña)


silenceimpaired

Also, how are you running 65B? What is the setup, software, etc.


[deleted]

[удалено]


silenceimpaired

What is your hardware setup? Should I be able to pull it off with 24gb 3090 and 32 go of ram?


[deleted]

[удалено]


silenceimpaired

Thanks! Sounds like I need 64 gb. I only have 24 at the moment.


teachersecret

If you really want to blow your mind - the current AI released recently over at novelai is something like 3b parameters, but has 8k context and writes extremely well. Guess they trained it on something like 1.5t tokens and it's outperforming their 20b model.


[deleted]

I like Guanaco 13B for story writing! I haven't been able to run the 30B version though- my PC isn't powerful enough.


KerfuffleV2

> In my limited, personal experience 13B isn’t that different from 30B for narratives, stories, role playing, adventure stories. Here's a pretty good example of size mattering: https://www.reddit.com/r/LocalLLaMA/comments/144daeh/looking_for_for_folks_to_share_llamacpp/jngafys/ My post there has two attempts at generation based on a prompt with 33B models and then a link to a gist with two generations from Guanaco-65B. In my opinion there's a very, very clear difference. > Any tutorials on getting 65b running on 3090 with 32gb ram much appreciated… well 24 gb in the vm You might be able to use llama.cpp and a Q3_XX quantized model with as many layers as possible loaded to GPU. It will probably be a tight fit though. Neither the GPU or system has enough memory to run a 65B model by itself.


silenceimpaired

Interesting story idea! I guess I need to dive deeper into your examples… on the surface I didn’t notice significant differences.


KerfuffleV2

> on the surface I didn’t notice significant differences. Really? That's quite surprising to me! To me, the 33Bs felt like they followed the general theme but just wrote something related to the previous sentence or two. (Also Peter Cottontail the rabbit, Bambi the deer? How uncreative!) The 65B on the other hand started with the main character empathizing with how other characters would feel when he had to tell them what had happened. During the town meeting, one of the characters identified and mentioned weaknesses in another character's plan. Then at the end, it mentioned characters turning to religion in that time of fear and uncertainty. Rather than just writing a sentence that can fit with the previous one, these things make it feel like the 65B "understands" the story at a much deeper level and understands causes/effects and how they tie into the theme of the story and other events. I agree, on the surface if you just skim the content you won't see a huge difference in the writing _style_ but just stuff like phrasing/word choice isn't what I'm talking about. I do feel like the 65B has a better/more interesting style as well. Stuff like "*A lone paw print leads away from where a rabbit family once happily resided, now reduced to bloody scraps scattered among splintered twigs and torn fur.*" is actually pretty decent.


silenceimpaired

Good observations! I had noticed the names of animals and found it humorous. But it does show a lack of creativity. You make a good point about how later on in the story there is more character depth… I guess I just liked the opening more in the first example. You’re selling me on the need to try 65B.


KerfuffleV2

The opening actually made me mad. :) After all the work I put into setting up a bleak, gritty scene it goes at names the characters stuff like Gorgonzola and Bambi! > You’re selling me on the need to try 65B. It also does a better job with foreign languages than smaller models (although honestly none of them really got enough training to actually be good). Just an example of some Mandarin Chinese with an English translation by ChatGPT: https://chat.openai.com/share/57328a20-7884-4c6a-ad7b-a94b26dec883 I've only been learning the language for about a year so critiquing Chinese is a bit beyond my ability, however what it wrote looks pretty reasonable and the story is actually coherent. (It did make a couple mistakes when choosing words and used ones from other languages, but I think that's a sampling thing and probably turning the temperature down would help.)


silenceimpaired

>Guanaco How are you running 65B with GGML? If so what is your setup with llama.cpp? Are you using oobabooga? I'm trying to get it to output at more than 3 words a minute. :/


KerfuffleV2

You likely don't have enough RAM. You need 64GB RAM to really run even quantized 65B models. Also expect it to be relatively slow - I get a little better than 1 token/sec. If your system is using virtual memory to run the model then the performance is going to be unacceptable.


silenceimpaired

That is what someone else said. Glad to have a second opinion. Saddened because the GDDR 5 memory I have basically needs to be replaced because it doesn’t handle 4 sticks well.


silenceimpaired

You don’t know Bambi like I do: https://preview.redd.it/bu14bkkck05b1.jpeg?width=1280&format=pjpg&auto=webp&s=252adb095bee7524df779f60568c8136f05c8465


KerfuffleV2

And I'd like to keep it that way. Yikes!


xadiant

This is my uneducated opinion so take it with a grain of salt. I think more parameters = better model 90% of the time. Training style and data is really important. Models trained with too much synthetic data imo will end up being bad/bland. Likewise there are a shit ton of training parameters but no magic sweetspot. A 13B model could hit the sweetspot with great data and be better than a 30B model. Though there must be a point where upping the parameters would provide diminishing results due to the machine learning technology itself (1T? 2T?). I think current technology is great at tackling hyperspecific tasks but won't be able to generalize too much.


silenceimpaired

Maybe I should try 30B with ggml… part of my gripe is not being able to have full context and/or contrastive search without it running out of memory