Skip to content

Releases: ggml-org/llama.cpp

b5740

22 Jun 21:40
fa4a9f2
Compare
Choose a tag to compare
quantize : handle user-defined pruning of whole layers (blocks) (#13037)

b5738

22 Jun 18:12
66aba7a
Compare
Choose a tag to compare
run : avoid double tokenization (#14327)

* run : avoid double tokenization by adopting common_tokenize heuristic

* build : fix windows gcc and clang warnings

* lint : fixed trailing whitepace

* run : fix is_first flag

b5737

22 Jun 17:43
f1f5e82
Compare
Choose a tag to compare
examples : fix is_first logic for tokenization (#14329)

ggml-ci

b5736

22 Jun 16:06
af3373f
Compare
Choose a tag to compare
HIP: enable vec fattn on RDNA4 (#14323)

b5735

22 Jun 12:59
5d5c066
Compare
Choose a tag to compare
mtmd : fix Pixtral OOM with large images by capping image_size to 102…

b5734

22 Jun 06:49
40bfa04
Compare
Choose a tag to compare
common : use std::string_view now that we target c++17 (#14319)

b5733

22 Jun 05:51
aa064b2
Compare
Choose a tag to compare
CUDA: add mean operation (#14313)

* CUDA: add mean operation

* add back sum_rows_f32_cuda

* Review: early exit if col!=0

b5731

21 Jun 07:04
bb16041
Compare
Choose a tag to compare
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (…

b5729

21 Jun 05:58
67ae531
Compare
Choose a tag to compare
metal : fix thread-safety (#14300)

ggml-ci

b5728

21 Jun 05:38
692e3cd
Compare
Choose a tag to compare
memory : rename interface to llama_memory_context_i (#14296)

* memory : rename interface to llama_memory_context_i

ggml-ci

* cont : fix comments

* cont : use "mctx" for referencing a memory context

ggml-ci