Releases · ggml-org/llama.cpp

28 Jun 16:05

63a7bb3

b5773

vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…

Assets 15

28 Jun 15:40

github-actions

b5772

00d5282

b5772

vulkan: lock accesses of pinned_memory vector (#14333)

Assets 15

28 Jun 14:25

github-actions

b5771

566c16f

b5771

model : add support for ERNIE 4.5 0.3B model (#14408)

Add Day-0 support for Baidu ERNIE 4.5 0.3B model.

Signed-off-by: Weizhao Ouyang <[email protected]>

Assets 15

28 Jun 09:55

github-actions

b5770

b25e927

b5770

fix async_mode bug (#14432)

Assets 15

28 Jun 08:57

github-actions

b5769

6609507

b5769

ci : fix windows build and release (#14431)

Assets 15

26 Jun 14:18

github-actions

b5760

e8215db

b5760

metal : add special-case mat-vec mul for ne00 == 4 (#14385)

ggml-ci

Assets 15

26 Jun 13:56

github-actions

b5759

5783ae4

b5759

metal : batch rows copy in a single threadgroup (#14384)

* metal : batch rows copy in a single threadgroup

ggml-ci

* metal : handle some edge cases when threadgroup size is not a power of 2

ggml-ci

Assets 15

26 Jun 05:21

github-actions

b5757

716301d

b5757

musa: enable fp16 mma (all) and cublas on qy2 (#13842)

* musa: enable fp16 mma (all) and cublas on qy2

Signed-off-by: Xiaodong Ye <[email protected]>

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: disable MUL_MAT_ID (q2_k × f32) due to precision issues

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>

Assets 15

25 Jun 22:48

github-actions

b5756

60ef23d

b5756

ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317)

* ggml-cpu: add nnpa compile flag

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 4a9f60c201573128f73a65999b3e5cc497fae5c1)

* ggml-cpu: add fp16->fp32 nnpa first

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 8d4a7987f9c1887f716be96250f2caeee0253929)

* ggml-cpu: add fp32->fp16

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 0ff0d6516247a41d2ade42b42cf0d676a4dd1627)

* ggml-cpu: better variable names

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 2f58bbcbb89c183340e252362b2a40651f573f1f)

* docs: update s390x docs

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 01b929491b50071a5d0572235dcf5a449da70aa7)

* ggml-cpu: add debugging prints to see if dlf16 is correct

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix print vs printf

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix float placeholder

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: ensure fp16 and fp32 load and stores are called

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fp16 load ensured to hit

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: remove sigint from fp16 store

for some reason, the function is not getting a hit when debugged with
    gdb. we will need to investigate further

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: nnpa switch to vec_xst test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: switch to vec_xst for 4 element loops also

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: rework noop

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: remove noop, general code cleanup

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: clarify variable naming

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add breakpoint for debugging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: test fix for conversion failure

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: disable fp32->fp16 nnpa conversions for now

there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: switch to elif macro

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: reattempt fp32->fp16

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix typo

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: reattempt fp32->fp16

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix compiler types

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: change to typedef vector types

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add 4 element loops for fp32->fp16

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: clarified vector naming

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: bring back fp32->fp16 store nnpa

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add nnpa macro check in ggml-impl

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add missing __func__

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: diagnose why __NNPA__ macro is not being defined

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: import vecintrin.h to fix compiler errors

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: update macro tests

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: move s390x typedef to own header file

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: move s390x typedef to own header file"

This reverts commit 157f856c34589566151630e294563a420702db39.

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: switch to importing ggml-cpu-impl instead

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix macro declaration

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: test more macros

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add debug prints

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: bruteforce macro definitions

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: move macro definitions

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add ggml-impl.h to cmakelists

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: switch to private macros

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: move s390x typedef to own header file

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 157f856c34589566151630e294563a420702db39)

* ggml-cpu: move things around

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: bring back compile macros

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: switch to quotes for import

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add compiler error macro

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add s390x detection in ggml-src

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: bring back compile definitions

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: undo cmakelists work

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: move s390x typedef to own header file"

This reverts commit 18d79e1a30b39d9aaa0bd58400c5cf2c32135c9a.

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: remove typedefs.h

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: remove typedef from cmakelists

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add ggml-impl.h future notes

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add todo comment for future reference

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: clarify naming of dlf16

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: remove unnecessary target compile definitions

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings

Signed-off-by: Aaron Teo <[email protected]>

* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu

Signed-off-by: Aaron Teo <[email protected]>

* docs: update broken huggingface link for s390x

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix duplicate func names during compile

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: fix duplicate func names during compile"

This reverts commit fbb733451f27677063b914d4f6c9a9841d45b38d.

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"

This reverts commit bd288e8fa52b5244f65cee21cb61062f1a9e0ca5.

Signed-off-by: Aaron Teo <[email protected]>

* ggml: refactor fp16<->fp32 simd to ggml-cpu

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix missing simd-mappings.h import in quants.c

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix missing simd-mappings.h within repack

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix amx mmq missing simd-mappings.h

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: attempt at fixing loongarch failing build

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: move nnpa together with other fp16<->fp32 simd

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix wrong refactor of ggml-base

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555

Signed-off-by: Aaron Teo <[email protected]>

* ggml: remove dependency on ggml-cpu from ggml-base

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: remove mistaken fallback macro

fallback logic was already implemented but i was too sleepy to realise

Signed-off-by: Aaron Teo <[email protected]>

* ggml: move ggml_table_f32_f16 to ggml-cpu

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"

This reverts commit 32a3533564bdb7902cefb9c89b1c9e956a81ce29.

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"

This reverts commit 9e40d984ad27d7b60392fb2b7548885201864fe4.

Signed-off-by: Aaron Teo <[email protected]>

* ggml: move ggml_table_f32_f16 to ggml-cpu

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 9e40d984ad27d7b60392fb2b7548885201864fe4)

* ggml: move ggml_table_f32_f16 to ggml-cpu.c

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: extern c ggml_table_f32_f16 + chore docs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h

we rely on the variable declaration in ggml-cpu.c instead

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"

This reverts commit f71b21d2f74f5e03ec0c2b4fefd3cbf395aecf16.

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: bring back ggml_table_f32_f16

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: bring back ggml_table_f32_f16"

This reverts commit 2dce119178bed5ef5c8398c4230ddd14fef80e49.

Signed-off-by: Aaron Teo <[email protected]>

* fix ggml time initialization

* fix f32_f16 table init

* remove extra line

---------

Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: slaren <[email protected]>

Assets 15

25 Jun 21:50

github-actions

b5755

b193d53

b5755

ggml : do not output unprintable characters on GGUF load failure (#14…

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5773

Uh oh!

b5772

Uh oh!

b5771

Uh oh!

b5770

Uh oh!

b5769

Uh oh!

b5760

Uh oh!

b5759

Uh oh!

b5757

Uh oh!

b5756

Uh oh!

b5755

Uh oh!