Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5721
vocab : prevent tokenizer overflow (#14301) * vocab : prevent stack overflow in tokenize * vocab : return error instead of aborting on oversized token count * vocab : INT32_MIN from llama_tokenize on overflow
b5720
sycl: add usage of enqueue_functions extension (#14244) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <[email protected]>
b5719
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286) * Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <[email protected]>
b5718
llama : improve sep token handling (#14272)
b5717
cuda : synchronize graph capture and cublas handle destruction (#14288) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
b5716
ggml : fix repack work size for mul_mat_id (#14292) ggml-ci
b5715
ggml: Update KleidiAI to v1.9.0 (#14277)
b5714
model : more uniform output id handling (#14275) * model : more uniform output id handling ggml-ci * cont : revert n_outputs < n_tokens optimization ggml-ci * cont : fix out_ids initialization ggml-ci
b5713
ubatch : new splitting logic (#14217) ggml-ci
b5712
CUDA: add conv_2d_dw (#14265) * CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const