Skip to content

Releases: ggml-org/llama.cpp

b5775

29 Jun 08:36
bd9c981
Compare
Choose a tag to compare
vulkan: Add fusion support for RMS_NORM+MUL (#14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <[email protected]>

b5774

28 Jun 18:00
27208bf
Compare
Choose a tag to compare
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools

b5773

28 Jun 16:05
63a7bb3
Compare
Choose a tag to compare
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…

b5772

28 Jun 15:40
00d5282
Compare
Choose a tag to compare
vulkan: lock accesses of pinned_memory vector (#14333)

b5771

28 Jun 14:25
566c16f
Compare
Choose a tag to compare
model : add support for ERNIE 4.5 0.3B model (#14408)

Add Day-0 support for Baidu ERNIE 4.5 0.3B model.

Signed-off-by: Weizhao Ouyang <[email protected]>

b5770

28 Jun 09:55
b25e927
Compare
Choose a tag to compare
fix async_mode bug (#14432)

b5769

28 Jun 08:57
6609507
Compare
Choose a tag to compare
ci : fix windows build and release (#14431)

b5760

26 Jun 14:18
e8215db
Compare
Choose a tag to compare
metal : add special-case mat-vec mul for ne00 == 4 (#14385)

ggml-ci

b5759

26 Jun 13:56
5783ae4
Compare
Choose a tag to compare
metal : batch rows copy in a single threadgroup (#14384)

* metal : batch rows copy in a single threadgroup

ggml-ci

* metal : handle some edge cases when threadgroup size is not a power of 2

ggml-ci

b5757

26 Jun 05:21
716301d
Compare
Choose a tag to compare
musa: enable fp16 mma (all) and cublas on qy2 (#13842)

* musa: enable fp16 mma (all) and cublas on qy2

Signed-off-by: Xiaodong Ye <[email protected]>

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: disable MUL_MAT_ID (q2_k × f32) due to precision issues

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>