Skip to content

Releases: ggml-org/llama.cpp

b5721

20 Jun 16:07
dd6e6d0
Compare
Choose a tag to compare
vocab : prevent tokenizer overflow (#14301)

* vocab : prevent stack overflow in tokenize

* vocab : return error instead of aborting on oversized token count

* vocab : INT32_MIN from llama_tokenize on overflow

b5720

20 Jun 14:25
8308f98
Compare
Choose a tag to compare
sycl: add usage of enqueue_functions extension (#14244)

* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <[email protected]>

b5719

20 Jun 14:27
6369be0
Compare
Choose a tag to compare
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)

* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <[email protected]>

b5718

20 Jun 14:18
88fc854
Compare
Choose a tag to compare
llama : improve sep token handling (#14272)

b5717

20 Jun 13:34
e28c1b9
Compare
Choose a tag to compare
cuda : synchronize graph capture and cublas handle destruction (#14288)

Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

b5716

20 Jun 10:09
d27b3ca
Compare
Choose a tag to compare
ggml : fix repack work size for mul_mat_id (#14292)

ggml-ci

b5715

20 Jun 09:43
9230dbe
Compare
Choose a tag to compare
ggml: Update KleidiAI to v1.9.0 (#14277)

b5714

20 Jun 09:32
812939a
Compare
Choose a tag to compare
model : more uniform output id handling (#14275)

* model : more uniform output id handling

ggml-ci

* cont : revert n_outputs < n_tokens optimization

ggml-ci

* cont : fix out_ids initialization

ggml-ci

b5713

20 Jun 07:44
4c9fdfb
Compare
Choose a tag to compare
ubatch : new splitting logic (#14217)

ggml-ci

b5712

20 Jun 02:55
9eaa51e
Compare
Choose a tag to compare
CUDA: add conv_2d_dw (#14265)

* CUDA: add conv_2d_dw

* better naming

* simplify using template

* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const