Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5747
CUDA/HIP: optimize mmv paths taken for HIP devices (#14324) Co-authored-by: Johannes Gäßler <[email protected]>
b5745
vulkan: update windows SDK in release.yml (#14344)
b5744
llama : better rwkv chat template and add missing `inputs.use_jinja` …
b5743
CUDA: mul_mat_v support for batch sizes > 1 (#14262) * CUDA: mul_mat_v support for batch sizes > 1 * use 64 bit math for initial offset calculation
b5742
kv-cells : fix tracking of seq_pos (#14339) * kv-cells : fix tracking of seq_pos during cache reuse ggml-ci * cont : improve error message ggml-ci * cont : add more comments
b5740
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
b5738
run : avoid double tokenization (#14327) * run : avoid double tokenization by adopting common_tokenize heuristic * build : fix windows gcc and clang warnings * lint : fixed trailing whitepace * run : fix is_first flag
b5737
examples : fix is_first logic for tokenization (#14329) ggml-ci
b5736
HIP: enable vec fattn on RDNA4 (#14323)
b5735
mtmd : fix Pixtral OOM with large images by capping image_size to 102…