Skip to content

Releases: ggml-org/llama.cpp

b5778

29 Jun 16:01
f47c1d7
Compare
Choose a tag to compare
SYCL: disable faulty fp16 exp kernel (#14395)

* SYCL: disable faulty fp16 CPU exponent for now

* Revert "SYCL: disable faulty fp16 CPU exponent for now"

This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202.

* SYCL: disable faulty fp16 CPU exponent for now

* Fix logic of disabling exponent kernel

b5777

29 Jun 13:36
a5d1fb6
Compare
Choose a tag to compare
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443)

b5775

29 Jun 08:36
bd9c981
Compare
Choose a tag to compare
vulkan: Add fusion support for RMS_NORM+MUL (#14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <[email protected]>

b5774

28 Jun 18:00
27208bf
Compare
Choose a tag to compare
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools

b5773

28 Jun 16:05
63a7bb3
Compare
Choose a tag to compare
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…

b5772

28 Jun 15:40
00d5282
Compare
Choose a tag to compare
vulkan: lock accesses of pinned_memory vector (#14333)

b5771

28 Jun 14:25
566c16f
Compare
Choose a tag to compare
model : add support for ERNIE 4.5 0.3B model (#14408)

Add Day-0 support for Baidu ERNIE 4.5 0.3B model.

Signed-off-by: Weizhao Ouyang <[email protected]>

b5770

28 Jun 09:55
b25e927
Compare
Choose a tag to compare
fix async_mode bug (#14432)

b5769

28 Jun 08:57
6609507
Compare
Choose a tag to compare
ci : fix windows build and release (#14431)

b5760

26 Jun 14:18
e8215db
Compare
Choose a tag to compare
metal : add special-case mat-vec mul for ne00 == 4 (#14385)

ggml-ci