llama-server : Prompt eval very slow / Kimi-K2 and gpt-oss 120B #15147
Unanswered
YannFollet
asked this question in
Q&A
Replies: 1 comment
-
when I laungh in verbose mode I got this message I don't understand : |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
the prompt eval is the same speed of eval
during the prompt eval the GPU is 30% usage and only 1 cpu is used
tell me if there are better parameters or if there is some more test to do
thanks again for the great work on llama.cpp of everyone
./build/bin/llama-server -m ./models/Kimi-K2-Instruct-Q2_K-00001-of-00008.gguf -c 131072 -ngl 99 --host 0.0.0.0 --port 8080 --prio 2 --no-webui --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 -ot "blk\.([4-9]|[1-9][0-9])\.ffn_.*_exps\.weight=CPU" -fa --jinja
Beta Was this translation helpful? Give feedback.
All reactions