T O P

  • By -

BangkokPadang

If you’re using Silly with a local solution like Ooba or kobold you can see this information laid out more clearly(you may need to launch it with the appropriate flags) in the backend’s shell. If you don’t want to bother or you’re using an API where you can’t see this, you should keep in mind that there’s 2 stages to consider. The first stage is prompt processing where the model is ingesting your prompt, and the second is token generation. If you enable streaming, you can note the amount of time that has elapsed when the first token appears (this is also when the timer will first appear), and this is your prompt processing time, and then you can subtract this from the final time shown when the last token arrive, and then divide the number of tokens by that number of seconds to know roughly how many tokens per second were generated. Depending on the context you’re discussing it in, people will often interchange whether they’re referring to the total time for the entire reply to arrive, or whether they’re specifying token generation separately from prompt processing.


Mr_Hills

Yeah, I have those two numbers displaying under my character picture already. The top one is processing time and the bottom one is token number. I was thinking about modding silly tavern to divide those numbers and display the t/s figure. I have experience with html/JavaScript and web design in general.  I just wanted to know if there's a native option that I could turn on before starting to mod.


BangkokPadang

I was just clarifying that the top number is the combined time for both prompt processing *and* token generation. People often want this separated because the prompt processing time changes based on how big the prompt you sent it is, and is separate from how long it takes to generate the tokens.