Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5740
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
b5738
run : avoid double tokenization (#14327) * run : avoid double tokenization by adopting common_tokenize heuristic * build : fix windows gcc and clang warnings * lint : fixed trailing whitepace * run : fix is_first flag
b5737
examples : fix is_first logic for tokenization (#14329) ggml-ci
b5736
HIP: enable vec fattn on RDNA4 (#14323)
b5735
mtmd : fix Pixtral OOM with large images by capping image_size to 102…
b5734
common : use std::string_view now that we target c++17 (#14319)
b5733
CUDA: add mean operation (#14313) * CUDA: add mean operation * add back sum_rows_f32_cuda * Review: early exit if col!=0
b5731
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (…
b5729
metal : fix thread-safety (#14300) ggml-ci
b5728
memory : rename interface to llama_memory_context_i (#14296) * memory : rename interface to llama_memory_context_i ggml-ci * cont : fix comments * cont : use "mctx" for referencing a memory context ggml-ci