Releases · ggml-org/llama.cpp

20 Jun 16:07

dd6e6d0

b5721

vocab : prevent tokenizer overflow (#14301)

* vocab : prevent stack overflow in tokenize

* vocab : return error instead of aborting on oversized token count

* vocab : INT32_MIN from llama_tokenize on overflow

Assets 15

20 Jun 14:25

github-actions

b5720

8308f98

b5720

sycl: add usage of enqueue_functions extension (#14244)

* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <[email protected]>

Assets 15

20 Jun 14:27

github-actions

b5719

6369be0

b5719

Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)

* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <[email protected]>

Assets 15

20 Jun 14:18

github-actions

b5718

88fc854

b5718

llama : improve sep token handling (#14272)

Assets 15

20 Jun 13:34

github-actions

b5717

e28c1b9

b5717

cuda : synchronize graph capture and cublas handle destruction (#14288)

Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

Assets 15

20 Jun 10:09

github-actions

b5716

d27b3ca

b5716

ggml : fix repack work size for mul_mat_id (#14292)

ggml-ci

Assets 15

20 Jun 09:43

github-actions

b5715

9230dbe

b5715

ggml: Update KleidiAI to v1.9.0 (#14277)

Assets 15

20 Jun 09:32

github-actions

b5714

812939a

b5714

model : more uniform output id handling (#14275)

* model : more uniform output id handling

ggml-ci

* cont : revert n_outputs < n_tokens optimization

ggml-ci

* cont : fix out_ids initialization

ggml-ci

Assets 15

20 Jun 07:44

github-actions

b5713

4c9fdfb

b5713

ubatch : new splitting logic (#14217)

ggml-ci

Assets 15

20 Jun 02:55

github-actions

b5712

9eaa51e

b5712

CUDA: add conv_2d_dw (#14265)

* CUDA: add conv_2d_dw

* better naming

* simplify using template

* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5721

Uh oh!

b5720

Uh oh!

b5719

Uh oh!

b5718

Uh oh!

b5717

Uh oh!

b5716

Uh oh!

b5715

Uh oh!

b5714

Uh oh!

b5713

Uh oh!

b5712

Uh oh!