Skip to content

Releases: ggml-org/llama.cpp

b5747

24 Jun 00:09
0142961
Compare
Choose a tag to compare
CUDA/HIP: optimize mmv paths taken for HIP devices (#14324)

Co-authored-by: Johannes Gäßler <[email protected]>

b5745

23 Jun 14:17
bf2a99e
Compare
Choose a tag to compare
vulkan: update windows SDK in release.yml (#14344)

b5744

23 Jun 13:28
72c6bc3
Compare
Choose a tag to compare
llama : better rwkv chat template and add missing `inputs.use_jinja` …

b5743

23 Jun 12:47
defe215
Compare
Choose a tag to compare
CUDA: mul_mat_v support for batch sizes > 1 (#14262)

* CUDA: mul_mat_v support for batch sizes > 1

* use 64 bit math for initial offset calculation

b5742

23 Jun 10:21
7b50d58
Compare
Choose a tag to compare
kv-cells : fix tracking of seq_pos (#14339)

* kv-cells : fix tracking of seq_pos during cache reuse

ggml-ci

* cont : improve error message

ggml-ci

* cont : add more comments

b5740

22 Jun 21:40
fa4a9f2
Compare
Choose a tag to compare
quantize : handle user-defined pruning of whole layers (blocks) (#13037)

b5738

22 Jun 18:12
66aba7a
Compare
Choose a tag to compare
run : avoid double tokenization (#14327)

* run : avoid double tokenization by adopting common_tokenize heuristic

* build : fix windows gcc and clang warnings

* lint : fixed trailing whitepace

* run : fix is_first flag

b5737

22 Jun 17:43
f1f5e82
Compare
Choose a tag to compare
examples : fix is_first logic for tokenization (#14329)

ggml-ci

b5736

22 Jun 16:06
af3373f
Compare
Choose a tag to compare
HIP: enable vec fattn on RDNA4 (#14323)

b5735

22 Jun 12:59
5d5c066
Compare
Choose a tag to compare
mtmd : fix Pixtral OOM with large images by capping image_size to 102…