Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
555 commits
Select commit Hold shift + click to select a range
37fc3bb
Add Infrastructure for SHGEMV
Mousius Oct 7, 2025
f552040
Fix stride issue.
ChipKerchner Oct 7, 2025
de43ccc
Merge pull request #5485 from Mousius/shgemv-infra
martin-frbg Oct 7, 2025
ba143f3
Merge remote-tracking branch 'refs/remotes/origin/develop' into develop
ChipKerchner Oct 7, 2025
064751e
Merge pull request #5481 from ChipKerchner/vectorSBGEMV
martin-frbg Oct 7, 2025
46fc6c0
fix unspecified array size in clobber list
martin-frbg Oct 8, 2025
7af2225
Merge pull request #5490 from martin-frbg/issue5489
martin-frbg Oct 8, 2025
fa912ce
rework definitions of ?FLOAT16_GEMM_GEMV_FORWARD
martin-frbg Oct 8, 2025
49eca84
Merge pull request #5478 from martin-frbg/issue5477
martin-frbg Oct 8, 2025
c3ce473
Merge pull request #5491 from martin-frbg/fixup5485
martin-frbg Oct 8, 2025
20f5ed1
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Oct 8, 2025
47a66ae
Update limits based on benchmarking the SME code on Apple M4
martin-frbg Oct 8, 2025
ac1604b
Merge remote-tracking branch 'refs/remotes/origin/develop' into develop
ChipKerchner Oct 8, 2025
03a8377
Tie in SHGEMV for RISC-V.
ChipKerchner Oct 8, 2025
4ac29b9
Merge pull request #5492 from ChipKerchner/activateSHGEMV
martin-frbg Oct 8, 2025
6f3691a
Fix cross compilation for x86 targets from non-x86
pcc Oct 8, 2025
e2399be
add macro
pratiklp00 Oct 9, 2025
acff97c
Ensure qemu is installed for running the tests
martin-frbg Oct 9, 2025
de00413
Update riscv64_vector.yml
martin-frbg Oct 9, 2025
f6b0d48
Add BUILD_BFLOAT16/HFLOAT16 for RISCV_ZVL256B target
martin-frbg Oct 9, 2025
644ea07
Support for SME1 based strmm_direct kernel for cblas_strmm level 3 API
changjua Sep 15, 2025
09c1877
Add test for SHGEMM
martin-frbg Oct 10, 2025
4291fa2
fix misnaming of NVHPC as NVC in ARM64 compiler option selection
martin-frbg Oct 10, 2025
5c89e4c
remove the stricted build flags
martin-frbg Oct 10, 2025
d5870f2
Merge pull request #5496 from martin-frbg/riscv-qemu2
martin-frbg Oct 10, 2025
fba2014
remove spurious POSIX define
martin-frbg Oct 10, 2025
e9a4553
Use DYNAMIC_LIST for OSX builds that time out; try speeding up mingw …
martin-frbg Oct 10, 2025
5b18a3f
fix copy/paste error
martin-frbg Oct 10, 2025
5ba2b9e
Merge pull request #5500 from martin-frbg/issue5498
martin-frbg Oct 10, 2025
4766775
Merge pull request #5495 from pcc/fix-cross
martin-frbg Oct 10, 2025
cb48a52
fix accidental indentation
martin-frbg Oct 10, 2025
a5fda2e
fix missed bfloat/hfloat edit
martin-frbg Oct 10, 2025
ffd2e47
drop LAPACK from slow mingw build
martin-frbg Oct 11, 2025
e40714c
Merge pull request #5450 from quic/topic/strmm_direct_sme1
martin-frbg Oct 11, 2025
b94e9b9
Fix compilation on ARM
yuyichao Oct 5, 2025
b6d5057
Merge pull request #5482 from yuyichao/arm-fix
martin-frbg Oct 12, 2025
9bfc361
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Oct 12, 2025
d539685
rewrite lapacke headers with pre/postfixes if necessary
martin-frbg Oct 12, 2025
87470a3
remove unused definitions
martin-frbg Oct 12, 2025
6637352
remmove spacing
pratiklp00 Oct 14, 2025
05adb52
copypaste fix
martin-frbg Oct 14, 2025
96f3462
Add symbol pre- and/or postfixes to lapack.h and lapacke.h
martin-frbg Oct 15, 2025
19be504
Add tests varying alpha and beta
martin-frbg Oct 15, 2025
887f4f3
Merge branch 'OpenMathLib:develop' into issue5497
martin-frbg Oct 15, 2025
3d5010b
Fix test for pre/postfix
martin-frbg Oct 16, 2025
ee6aa89
Add BFLOAT16 and HFLOAT16 tests
martin-frbg Oct 16, 2025
c0b2772
move L2 HFLOAT16 kernels out of the BFLOAT16 block
martin-frbg Oct 16, 2025
c92bac1
Add SHGEMV
martin-frbg Oct 16, 2025
a387217
Add BGEMV
martin-frbg Oct 16, 2025
a9a152e
fix bgemv build
martin-frbg Oct 16, 2025
5b640b1
add bgemm_thread_xx
martin-frbg Oct 16, 2025
f3cecbe
Merge pull request #5508 from martin-frbg/cmake_hfloat
martin-frbg Oct 16, 2025
098a8d5
Merge branch 'OpenMathLib:develop' into issue5497
martin-frbg Oct 16, 2025
c35b11a
Merge pull request #5501 from martin-frbg/azure_timeouts
martin-frbg Oct 17, 2025
4c1741d
Add compiler options for RISCV
martin-frbg Oct 17, 2025
016e2f1
Merge pull request #5499 from martin-frbg/issue5497
martin-frbg Oct 17, 2025
aef36a3
Merge pull request #5509 from martin-frbg/cmake_riscv
martin-frbg Oct 17, 2025
8211db6
Don't enable SME for VortexM4 when the compiler is gcc (which does no…
martin-frbg Oct 19, 2025
2346d0b
Add HAVE_SME for VortexM4 only with non-gcc compilers
martin-frbg Oct 19, 2025
d7b0fcc
Enable SME-based kernels for VortexM4 with clang-based compilers only
martin-frbg Oct 19, 2025
643a0b5
Allow VortexM4 on the direct_SME fast path only for clang-based compi…
martin-frbg Oct 19, 2025
e01b109
Allow VortexM4 on the same fast path only with non-gcc compilers
martin-frbg Oct 19, 2025
f4ee3ae
Allow VortexM4 on the SME fast path only with non-gcc compilers
martin-frbg Oct 19, 2025
1b591ea
export HAVE_SME setting and exclude VortexM4 from DYNAMIC_ARCH if gcc…
martin-frbg Oct 19, 2025
83d3e0e
fix copy/paste
martin-frbg Oct 19, 2025
682f61e
Add prototype for gotoblas_corename
martin-frbg Oct 19, 2025
43d38d3
Support for SME1 based ssyrk_direct kernel for cblas_ssyrk level 3 API
changjua Oct 14, 2025
3d19d3b
Make dummy function have the same linkage as the real one
yuyichao Oct 20, 2025
cb66aca
Improve single-thread performance of [SD]GER on A64FX and Neoverse V1
iha-taisei Oct 22, 2025
b2b9abc
Revert "Enhancing Core Utilization in BLAS Calls: A Scalable Architec…
martin-frbg Oct 22, 2025
43d0803
Merge pull request #5513 from yuyichao/arm-fix
martin-frbg Oct 23, 2025
677424a
Merge pull request #5516 from OpenMathLib/revert-4741-Pthread_Scalabi…
martin-frbg Oct 23, 2025
585e6d0
Merge pull request #5515 from iha-taisei/feature/ger_unroll
martin-frbg Oct 24, 2025
c5b0d1e
Add lower limit for multithreading
martin-frbg Oct 28, 2025
75b3e11
Add lower limit for multithreading
martin-frbg Oct 28, 2025
8e44cde
Add lower limit for multithreading
martin-frbg Oct 28, 2025
c1c1285
Add lower limit for multithreading
martin-frbg Oct 28, 2025
0c59ae0
Merge pull request #5453 from pratiklp00/dgemm_optimization
martin-frbg Oct 28, 2025
ef6f976
[WIP,Testing] remove the lock around the thread shutdown function aga…
martin-frbg Oct 30, 2025
18eb6a7
Merge pull request #5519 from martin-frbg/issue5517
martin-frbg Oct 30, 2025
1da3b47
Fix #5521: add @SUFFIX64@ in OpenBLASConfig.cmake.in
Smilyf Nov 1, 2025
358c582
Fix missing support for HFLOAT16 in Windows symbol renaming/dll gener…
martin-frbg Nov 1, 2025
aa43496
Update Xcode
martin-frbg Nov 1, 2025
716feb6
Merge pull request #5525 from martin-frbg/issue5524
martin-frbg Nov 2, 2025
2b745f8
Update Xcode SDK versions as well
martin-frbg Nov 2, 2025
2e7c667
Merge pull request #5526 from martin-frbg/fixcirrusxcode
martin-frbg Nov 2, 2025
9c8626d
Merge pull request #5522 from Smilyf/bugfix/issue-5521
martin-frbg Nov 2, 2025
92fe96b
fix processing of lapacke.h
martin-frbg Nov 2, 2025
93e89c0
Merge remote-tracking branch 'refs/remotes/origin/develop' into develop
ChipKerchner Nov 3, 2025
3a9da52
RISCV64-CI: don't rely on dependency resolution for qemu-user (#5506)
martin-frbg Nov 4, 2025
8882409
Merge branch 'OpenMathLib:develop' into issue5493
martin-frbg Nov 4, 2025
7ca689b
Merge remote-tracking branch 'refs/remotes/origin/develop' into develop
ChipKerchner Nov 4, 2025
edf2e59
Prevent possible conversion from bfloat16 to __bf16.
ChipKerchner Nov 4, 2025
00a7336
Missing one gemv conversion.
ChipKerchner Nov 4, 2025
123c25c
Merge pull request #5527 from ChipKerchner/fixbfloat16Tobf16conversions
martin-frbg Nov 6, 2025
7bdb3ac
Add back the SBGEMM/SBGEMV tests
martin-frbg Nov 6, 2025
65af1b1
Merge pull request #5530 from martin-frbg/riscv_sbgemmCI
martin-frbg Nov 6, 2025
f2d010d
Merge pull request #5512 from quic/topic/ssyrk_direct_sme1
martin-frbg Nov 6, 2025
8a0b97e
CYGWIN needs to be named OS_CYGWIN_NT in config.h
martin-frbg Nov 7, 2025
f00c0d0
CYGWIN builds currently require blas_server_win32
martin-frbg Nov 7, 2025
bbb87aa
Update OSX gcc12 job to gcc15
martin-frbg Nov 7, 2025
aa7e9ab
Merge pull request #5531 from martin-frbg/issue5510
martin-frbg Nov 7, 2025
f6df9be
Merge pull request #5533 from martin-frbg/azure-osxgcc
martin-frbg Nov 8, 2025
71c6016
Fix f_check detection of LLVM 21 flang
bartoldeman Nov 15, 2025
39d5e44
fix: dot_kernel_sve "n" usage & clobber list
mayeut Nov 15, 2025
98a8230
Optimize ZROT_RVV for the unit-stride case (inc_x = inc_y = 1)
Nov 18, 2025
762ed66
Refactoring: ARM64 dot Kernel: don't call num_cpu_avail twice
FRosner Nov 18, 2025
ccef6cc
Recognize -fdefault-integer-8 for LLVMs flang
Thyre Nov 18, 2025
75ceb6c
Merge pull request #5536 from mayeut/clang-sve
martin-frbg Nov 18, 2025
17f2e94
Merge pull request #5539 from FRosner/arm64-dot-kernel-refactoring
martin-frbg Nov 18, 2025
a51a1b8
Merge pull request #5540 from Thyre/support-flang-new-integer-8
martin-frbg Nov 18, 2025
28eeef5
Merge pull request #5538 from CheryDan/riscv/rot
martin-frbg Nov 19, 2025
a14caf4
add tt for a64fx dot
abhishek-iitmadras Nov 17, 2025
64b9600
Skip C and Fortran compiler combination checks in AIX if NO_FORTRAN o…
ayappanec Nov 21, 2025
a367f5f
fix: rpcc on linux aarch64
mayeut Nov 22, 2025
f7b7296
Fix compilation with LLVM
martin-frbg Nov 22, 2025
29fab2b
Merge pull request #5546 from martin-frbg/issue5545
martin-frbg Nov 22, 2025
fa0403b
Report proper cache sizes for Qualcomm Oryon in WoA
martin-frbg Nov 22, 2025
4867c42
ci: add build with clang on ppc64le
mayeut Nov 22, 2025
58ee3c0
Merge pull request #5544 from mayeut/rpcc-aarch64
martin-frbg Nov 22, 2025
c5e1967
fix(warning): taking the absolute value of 'bfloat16' has no effect
mayeut Nov 23, 2025
93d0d19
Merge pull request #5547 from martin-frbg/oryon_cachesizes
martin-frbg Nov 23, 2025
0d5bf7b
flang does not understand -frecursive
martin-frbg Nov 23, 2025
ea85b66
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Nov 23, 2025
88c1899
Merge pull request #5549 from mayeut/warn-bfloat16
martin-frbg Nov 23, 2025
7cef952
Merge pull request #5550 from martin-frbg/flangppc
martin-frbg Nov 23, 2025
48e33f2
Merge pull request #5543 from ayappanec/AIX-compiler-checks
martin-frbg Nov 23, 2025
9c0965b
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Nov 23, 2025
5b79d01
Merge pull request #5434 from ywwry66/mixed_openmp_warning
martin-frbg Nov 23, 2025
d6b25c4
Merge pull request #5542 from abhishek-iitmadras/abhishek_new_tt_a64fx
martin-frbg Nov 23, 2025
8c0b13c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Nov 23, 2025
7d35bf6
Add cpuid for Apple M5 (from a PR to the archspec project)
martin-frbg Nov 24, 2025
8da0a1f
Updated SGEMV ramps.
almayne Nov 24, 2025
7e44f62
fix sequence of arm64 sgemm_direct_performance and sgemm_direct_ab
martin-frbg Nov 24, 2025
b0bd49a
Add compiler guard around the M4 HAVE_SME property
martin-frbg Nov 24, 2025
4af1870
Only add dedicated VORTEXM4 if building with LLVM
martin-frbg Nov 24, 2025
b185c9a
small fixes for separating sme and dummy parts
martin-frbg Nov 24, 2025
a683287
rework for dynamic_arch
martin-frbg Nov 24, 2025
705259c
remove redundant HAVE_SME
martin-frbg Nov 24, 2025
7ab8dc1
rework ARM64 SME dependency handling
martin-frbg Nov 24, 2025
c3c857c
fix sequence
martin-frbg Nov 24, 2025
5f07358
fix param.h: turn [sd]gemm_default_[pqr] parameters for a64fx
hideaki-motoki Nov 28, 2025
825d3ad
AppleClang does not define feature local_streaming
martin-frbg Nov 28, 2025
7750d50
chore: add test case for exec_blas_async after fork
mayeut Nov 29, 2025
396137f
revert locks introduced in #5170
mayeut Nov 29, 2025
f6533cc
Fix floating point registers ld/st bug of Loongarch
ErnstPeng Dec 3, 2025
68ff451
Merge pull request #5558 from ErnstPeng/fix-LA
martin-frbg Dec 3, 2025
e85efb8
remove za from clobber lists
martin-frbg Dec 3, 2025
9384776
Fix MSVC versions of the inline c/zdot function
martin-frbg Dec 11, 2025
80a12ae
Update deprecated macos-13 instances to macos-14
martin-frbg Dec 12, 2025
231d7c4
Update xcode version for macos-14
martin-frbg Dec 12, 2025
38882e0
Update xcode sdks
martin-frbg Dec 12, 2025
e7ddd63
list xcode platforms/sdks
martin-frbg Dec 12, 2025
a680c60
Update iPhoneOS SDK
martin-frbg Dec 12, 2025
5d68b8f
Merge pull request #5568 from martin-frbg/azureosx14
martin-frbg Dec 12, 2025
4dbbdca
Merge branch 'OpenMathLib:develop' into issue5562
martin-frbg Dec 12, 2025
486d150
Merge pull request #5567 from martin-frbg/issue5562
martin-frbg Dec 12, 2025
9023e76
Merge pull request #5556 from mayeut/fork-bug
martin-frbg Dec 13, 2025
eb098f6
Revert "[WIP,Testing] remove the lock around the thread shutdown func…
martin-frbg Dec 13, 2025
cbecf98
fix regression due to adding bgemv interfaces
mattip Dec 15, 2025
e1d2411
Fix cut-n-paste error introduced with previous fix for zdotc/zdotu
martin-frbg Dec 15, 2025
e155bc0
Merge pull request #5571 from mattip/issue5570
martin-frbg Dec 15, 2025
5aff62e
Merge pull request #5572 from martin-frbg/issue5562-2
martin-frbg Dec 16, 2025
bc52252
Fix previous misedits in MSVC complex dot and fix MSVC macros for com…
martin-frbg Dec 19, 2025
6bc4276
Merge pull request #5574 from martin-frbg/issue5562-3
martin-frbg Dec 19, 2025
cfa28bc
Support compilation with LLVM for Windows on Arm
martin-frbg Dec 19, 2025
ac2c663
remove special handling of C/ZDOT for LLVM on WoA
martin-frbg Dec 19, 2025
b8163b6
Fix compilation error caused by inadvertent shadowing of variables in…
martin-frbg Dec 20, 2025
9fa64b9
Initialise string length variables
martin-frbg Dec 20, 2025
652bf6b
Initialize local variable to remove a compiler warning
martin-frbg Dec 20, 2025
c7b0304
Merge pull request #5576 from martin-frbg/issue5562-4
martin-frbg Dec 21, 2025
5b0884d
Use getenv for readenv_atoi in CYGWIN or MINGW builds
martin-frbg Dec 22, 2025
097d2d9
Merge pull request #5578 from martin-frbg/mingw-getenv
martin-frbg Dec 23, 2025
371663f
fix define for YIELDING
vtjnash Dec 22, 2025
d39b777
Make .align conditional on not being on WoA and strip CRLF endings
martin-frbg Dec 24, 2025
fed16d6
Update common.h
vtjnash Dec 24, 2025
0b2b583
POSIX.1-2008
vtjnash Dec 25, 2025
0ff51a4
Merge pull request #5579 from vtjnash/jn/YIELDING
martin-frbg Dec 25, 2025
067e43c
Merge pull request #5575 from martin-frbg/woa-neozdot
martin-frbg Dec 25, 2025
1f2bffb
Merge pull request #5551 from almayne/sgemv_ramps
martin-frbg Dec 25, 2025
5766adb
Merge pull request #5569 from OpenMathLib/revert-5479-forklock
martin-frbg Dec 25, 2025
cd02751
Merge pull request #5548 from mayeut/ppc64le-clang
martin-frbg Dec 25, 2025
e548bda
Update Windows/LLVM build to use miniforge and flang_win-64 package
martin-frbg Dec 27, 2025
5e3a992
replace mentions of miniconda with miniforge
martin-frbg Dec 27, 2025
83a788c
Add BLASLONG cast to the DEFAULT_ALIGN parameter of Cooper Lake and S…
martin-frbg Dec 29, 2025
579eda3
Name openmp packages in Windows/conda build recipe
martin-frbg Dec 29, 2025
54f7b76
Merge pull request #5588 from martin-frbg/cooperlake_cast
martin-frbg Dec 30, 2025
772741e
Merge pull request #5586 from martin-frbg/issue5337
martin-frbg Dec 30, 2025
80951a2
Merge pull request #5534 from bartoldeman/fix-flang-fcheck
martin-frbg Dec 30, 2025
e4344de
Merge pull request #5505 from martin-frbg/issue5493
martin-frbg Dec 31, 2025
275eb6f
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
5c8cf37
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
b183182
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
7beba94
Add workaround for current LLVM SME bug on Windows
martin-frbg Dec 31, 2025
f4383d0
syntax fix
martin-frbg Dec 31, 2025
67fd33e
syntax fix
martin-frbg Dec 31, 2025
2283fcb
POWER10: Reduce sgemm loop unrolling
Dec 18, 2025
8794979
Fix bug where openblas_set_threads_callback_function does not support…
lujiaweics Jan 5, 2026
badf4c0
Merge pull request #5592 from RajalakshmiSR/sgemm-p10-unroll
martin-frbg Jan 5, 2026
618bcbd
adjust M4 options to avoid undefined references with non-Apple LLVM
martin-frbg Jan 5, 2026
a18a536
Adjust M4 options to avoid unresolved reference with non-Apple LLVM
martin-frbg Jan 5, 2026
02bc005
reset SVE and SME capabilities between targets
martin-frbg Jan 5, 2026
e384396
Use the armv9 capability set in the compiler test for SME
martin-frbg Jan 5, 2026
7e612b6
Merge pull request #5594 from lujiaweics/fix/symbol-suffix-missing-th…
martin-frbg Jan 6, 2026
b53d18b
Fixing warning messages in dgemm and dgemv kernels
amritahs-ibm Jan 6, 2026
20ae36b
Merge pull request #5595 from amritahs-ibm/fix_dgemm_warnings
martin-frbg Jan 7, 2026
6939a43
Support for SME1 based ssyr2k_direct kernel for cblas_ssyr2k level 3 API
Dec 25, 2025
c040d5e
Merge pull request #5591 from quic/topic/ssyr2k_direct_sme1
martin-frbg Jan 8, 2026
2d46f1e
Merge branch 'develop' into issue5414
martin-frbg Jan 9, 2026
a9a6eda
Adapt for DYNAMIC_ARCH with multiple ...preprocess symbols
martin-frbg Jan 9, 2026
d7d1088
docs: fix iOS build script & use xcrun SDK path
moluopro Jan 9, 2026
a514760
Change 'make libs' back to 'make'
moluopro Jan 10, 2026
e5aebea
Merge pull request #5596 from moluopro/develop
martin-frbg Jan 10, 2026
d1de282
Improve the precision of S/CNRM2 by summing in double precision
martin-frbg Jan 11, 2026
1ffea2b
Merge pull request #5597 from martin-frbg/issue5503
martin-frbg Jan 11, 2026
52ec7fa
Merge pull request #5554 from hideaki-motoki/issue5553_gemm_default_p…
martin-frbg Jan 11, 2026
6de062c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Jan 11, 2026
4d08156
Use the generic C kernel for DNRM2
martin-frbg Jan 11, 2026
05d7c18
Merge pull request #5599 from martin-frbg/issue5552
martin-frbg Jan 11, 2026
e776297
Fix out-of-bounds accesses to the TAU array (Reference-LAPACK PR 1179)
martin-frbg Jan 11, 2026
01cc6df
Merge pull request #5600 from martin-frbg/lapack1179
martin-frbg Jan 11, 2026
aafd3cb
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Jan 11, 2026
0a53d91
Move early exit up; don't rely on support_sme() for now
martin-frbg Jan 12, 2026
31150eb
Move early exit up; don't rely on support_sme() for now
martin-frbg Jan 12, 2026
e04df19
Use linker response files on all Apple hardware
martin-frbg Jan 12, 2026
e07bea1
Merge pull request #5601 from martin-frbg/issue5336-2
martin-frbg Jan 12, 2026
3149408
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg Jan 12, 2026
10ba0e6
fix missing parentheses on endif
martin-frbg Jan 12, 2026
770ad68
Distinguish AppleClang from LLVM on ARM64
martin-frbg Jan 13, 2026
5e5f9a3
Apple Clang absolutely needs the +sme in the arch string
martin-frbg Jan 13, 2026
31bb6ca
Apple Clang requires +sme in the arch string for M4
martin-frbg Jan 13, 2026
533cab2
add prototype
martin-frbg Jan 13, 2026
bdcb9b7
add prototype
martin-frbg Jan 13, 2026
fa021e1
fix missing endif() and add AppleClang options for M4
martin-frbg Jan 13, 2026
6137236
fix os variable reference
martin-frbg Jan 13, 2026
6735872
drop the cpu=apple-m4 part as nonessential
martin-frbg Jan 14, 2026
d3e4b41
remove cpu=apple-m4 as not required and less portable
martin-frbg Jan 14, 2026
88c583e
Update Makefile
martin-frbg Jan 14, 2026
7ffce1c
fix spurious change of (S)BGEMM parameters for NeoverseV1
martin-frbg Jan 14, 2026
d49df4c
force linking to clang_rt_builtins when using LLVM for AppleM4
martin-frbg Jan 14, 2026
93cd7b9
Force linking to clang_rt_builtins when using LLVM for AppleM4
martin-frbg Jan 14, 2026
7acf919
typo
martin-frbg Jan 14, 2026
faa1875
typo fix
martin-frbg Jan 14, 2026
5133aac
Make VORTEXM4 available in DYNAMIC_ARCH on Apple
martin-frbg Jan 15, 2026
55a10c7
Make VortexM4 available in DYNAMIC_ARCH on MacOS only
martin-frbg Jan 15, 2026
6f225da
make VORTEXM4 MacOS-only for now
martin-frbg Jan 15, 2026
4cd575c
Merge pull request #5423 from martin-frbg/issue5414
martin-frbg Jan 15, 2026
5bb7ef1
Update the Changelog for version 0.3.31
martin-frbg Jan 15, 2026
10bd0ec
Merge pull request #5604 from martin-frbg/changelog0331
martin-frbg Jan 15, 2026
0e7b11f
Merge branch 'release-0.3.0' into develop
martin-frbg Jan 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 7 additions & 7 deletions .cirrus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ task:
- export VALID_ARCHS="i386 x86_64"
- xcrun --sdk macosx --show-sdk-path
- xcodebuild -version
- export CC=/Applications/Xcode_16.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_16.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.4.sdk -arch x86_64"
- export CC=/Applications/Xcode_26.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_26.0.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX26.0.sdk -arch x86_64"
- make TARGET=CORE2 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1 RANLIB="ls -l"
always:
config_artifacts:
Expand All @@ -78,8 +78,8 @@ task:
- export #PATH=/opt/homebrew/opt/llvm/bin:$PATH
- export #LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
- export #CPPFLAGS="-I/opt/homebrew/opt/llvm/include"
- export CC=/Applications/Xcode_16.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_16.3.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS18.4.sdk -arch arm64 -miphoneos-version-min=10.0"
- export CC=/Applications/Xcode_26.0.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_26.0.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS26.0.sdk -arch arm64 -miphoneos-version-min=10.0"
- xcrun --sdk iphoneos --show-sdk-path
- ls -l /Applications
- make TARGET=ARMV8 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1 CROSS=1
Expand Down Expand Up @@ -127,7 +127,7 @@ task:
FreeBSD_task:
name: FreeBSD-gcc
freebsd_instance:
image_family: freebsd-14-2
image_family: freebsd-14-3
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
compile_script:
Expand All @@ -138,7 +138,7 @@ FreeBSD_task:
FreeBSD_task:
name: freebsd-gcc-ilp64
freebsd_instance:
image_family: freebsd-14-2
image_family: freebsd-14-3
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
compile_script:
Expand All @@ -148,7 +148,7 @@ FreeBSD_task:
FreeBSD_task:
name: FreeBSD-clang-openmp
freebsd_instance:
image_family: freebsd-14-2
image_family: freebsd-14-3
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
- ln -s /usr/local/lib/gcc13/libgfortran.so.5.0.0 /usr/lib/libgfortran.so
Expand Down
10 changes: 8 additions & 2 deletions .github/workflows/apple_m.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
elif [ "$RUNNER_OS" == "macOS" ]; then
# It looks like "gfortran" isn't working correctly unless "gcc" is re-installed.
brew reinstall gcc
brew install coreutils cmake ccache
brew install coreutils ccache
brew install llvm
else
echo "::error::$RUNNER_OS not supported"
Expand Down Expand Up @@ -87,10 +87,16 @@ jobs:
echo "max_size = 300M" > ~/.ccache/ccache.conf
echo "compression = true" >> ~/.ccache/ccache.conf
ccache -s

- name: Add gfortran runtime to link path
if: matrix.build == 'make' && runner.os == 'macOS'
run: |
GFORTRAN_LIBDIR=$(gfortran -print-file-name=libgfortran.dylib | xargs dirname)
echo "Using gfortran runtime in $GFORTRAN_LIBDIR"
echo "LDFLAGS=-L/opt/homebrew/opt/llvm/lib -L$GFORTRAN_LIBDIR" >> $GITHUB_ENV

- name: Build OpenBLAS
run: |
export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"
export CC="/opt/homebrew/opt/llvm/bin/clang"
case "${{ matrix.build }}" in
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/arm64_graviton.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,14 @@ jobs:
run: |
case "${{ matrix.build }}" in
"make")
make -j$(nproc) DYNAMIC_ARCH=1 USE_OPENMP=0 FC="ccache ${{ matrix.fortran }}"
make -j$(nproc) DYNAMIC_ARCH=1 BUILD_BFLOAT16=1 USE_OPENMP=0 FC="ccache ${{ matrix.fortran }}"
;;
"cmake")
mkdir build && cd build
cmake -DDYNAMIC_ARCH=1 \
-DNOFORTRAN=0 \
-DBUILD_WITHOUT_LAPACK=0 \
-DBUILD_BFLOAT16=1 \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_Fortran_COMPILER=${{ matrix.fortran }} \
Expand Down
69 changes: 49 additions & 20 deletions .github/workflows/dynamic_arch.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: continuous build

on: [push, pull_request]
on: [push, pull_request, workflow_dispatch]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand All @@ -11,17 +11,24 @@ permissions:

jobs:
build:
if: "github.repository == 'OpenMathLib/OpenBLAS'"
if: "github.repository == 'OpenMathLib/OpenBLAS' || github.event_name == 'workflow_dispatch'"
runs-on: ${{ matrix.os }}

strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest, ubuntu-24.04-arm]
cc: [gcc, clang, clang-21]
fortran: [gfortran, flang]
build: [cmake, make]
exclude:
- os: macos-latest
cc: gcc
- os: macos-latest
cc: clang-21
- os: macos-latest
fortran: flang
- os: ubuntu-24.04-arm
fortran: flang

steps:
Expand All @@ -42,14 +49,27 @@ jobs:
- name: Install Dependencies
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
cat << EOF | sudo tee -a /etc/apt/apt.conf.d/01norecommend
APT::Install-Recommends "0";
APT::Install-Suggests "0";
EOF
sudo apt-get update
sudo apt-get install -y gfortran cmake ccache
wget http://security.ubuntu.com/ubuntu/pool/universe/n/ncurses/libtinfo5_6.3-2ubuntu0.1_amd64.deb
sudo apt install ./libtinfo5_6.3-2ubuntu0.1_amd64.deb
sudo apt-get install -y ccache
if [ "${{ matrix.cc }}" == "clang-21" ]; then
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 21
fi
if [ "${{ matrix.fortran }}" == "flang" ]; then
wget http://security.ubuntu.com/ubuntu/pool/universe/n/ncurses/libtinfo5_6.3-2ubuntu0.1_amd64.deb
sudo apt install ./libtinfo5_6.3-2ubuntu0.1_amd64.deb
else
sudo apt-get install -y ${{ matrix.fortran }}
fi
elif [ "$RUNNER_OS" == "macOS" ]; then
# It looks like "gfortran" isn't working correctly unless "gcc" is re-installed.
brew reinstall gcc
brew install coreutils cmake ccache
brew install coreutils ccache
else
echo "::error::$RUNNER_OS not supported"
exit 1
Expand All @@ -64,12 +84,12 @@ jobs:
# GNU make and cmake call the compilers differently. It looks like
# that causes the cache to mismatch. Keep the ccache for both build
# tools separate to avoid polluting each other.
key: ccache-${{ runner.os }}-${{ matrix.build }}-${{ matrix.fortran }}-${{ github.ref }}-${{ github.sha }}
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build }}-${{ matrix.cc }}-${{ matrix.fortran }}-${{ github.ref }}-${{ github.sha }}
# Restore a matching ccache cache entry. Prefer same branch and same Fortran compiler.
restore-keys: |
ccache-${{ runner.os }}-${{ matrix.build }}-${{ matrix.fortran }}-${{ github.ref }}
ccache-${{ runner.os }}-${{ matrix.build }}-${{ matrix.fortran }}
ccache-${{ runner.os }}-${{ matrix.build }}
ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build }}-${{ matrix.cc }}-${{ matrix.fortran }}-${{ github.ref }}
ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build }}-${{ matrix.cc }}-${{ matrix.fortran }}
ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build }}-${{ matrix.cc }}

- name: Configure ccache
run: |
Expand All @@ -90,6 +110,14 @@ jobs:
echo "compression = true" >> ~/.ccache/ccache.conf
ccache -s

- name: Add gfortran runtime to link path
if: matrix.build == 'make' && runner.os == 'macOS'
run: |
GFORTRAN_LIBDIR=$(gfortran -print-file-name=libgfortran.dylib | xargs dirname)
echo "Using gfortran runtime in $GFORTRAN_LIBDIR"
# Preserve whatever LDFLAGS may already contain
echo "LDFLAGS=${LDFLAGS:+$LDFLAGS }-L$GFORTRAN_LIBDIR" >> "$GITHUB_ENV"

- name: Build OpenBLAS
run: |
if [ "${{ matrix.fortran }}" = "flang" ]; then
Expand All @@ -102,7 +130,7 @@ jobs:
fi
case "${{ matrix.build }}" in
"make")
make -j$(nproc) DYNAMIC_ARCH=1 USE_OPENMP=0 FC="ccache ${{ matrix.fortran }}"
make -j$(nproc) DYNAMIC_ARCH=1 USE_OPENMP=0 CC="ccache ${{ matrix.cc }}" FC="ccache ${{ matrix.fortran }}"
;;
"cmake")
mkdir build && cd build
Expand All @@ -111,6 +139,7 @@ jobs:
-DBUILD_WITHOUT_LAPACK=0 \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=${{ matrix.cc }} \
-DCMAKE_Fortran_COMPILER=${{ matrix.fortran }} \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DCMAKE_Fortran_COMPILER_LAUNCHER=ccache \
Expand All @@ -134,13 +163,13 @@ jobs:
"make")
MAKE_FLAGS='DYNAMIC_ARCH=1 USE_OPENMP=0'
echo "::group::Tests in 'test' directory"
make -C test $MAKE_FLAGS FC="ccache ${{ matrix.fortran }}"
make -C test $MAKE_FLAGS CC="ccache ${{ matrix.cc }}" FC="ccache ${{ matrix.fortran }}"
echo "::endgroup::"
echo "::group::Tests in 'ctest' directory"
make -C ctest $MAKE_FLAGS FC="ccache ${{ matrix.fortran }}"
make -C ctest $MAKE_FLAGS CC="ccache ${{ matrix.cc }}" FC="ccache ${{ matrix.fortran }}"
echo "::endgroup::"
echo "::group::Tests in 'utest' directory"
make -C utest $MAKE_FLAGS FC="ccache ${{ matrix.fortran }}"
make -C utest $MAKE_FLAGS CC="ccache ${{ matrix.cc }}" FC="ccache ${{ matrix.fortran }}"
echo "::endgroup::"
;;
"cmake")
Expand Down Expand Up @@ -364,15 +393,15 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v3

- name: Install Dependencies
run: |
sudo apt-get update
sudo apt-get install -y gcc gfortran make

- name: Build OpenBLAS
run: |
make -j${nproc}
make -j${nproc}
make -j${nproc} lapack-test


2 changes: 1 addition & 1 deletion .github/workflows/loongarch64_clang.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:

- name: Install libffi6
run: |
wget http://ftp.ca.debian.org/debian/pool/main/libf/libffi/libffi6_3.2.1-9_amd64.deb
wget https://download.nvidia.com/cumulus/apt.cumulusnetworks.com/pool/upstream/libf/libffi/libffi6_3.2.1-9_amd64.deb
sudo dpkg -i libffi6_3.2.1-9_amd64.deb
- name: Install APT deps
Expand Down
27 changes: 20 additions & 7 deletions .github/workflows/riscv64_vector.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ jobs:
env:
triple: riscv64-unknown-linux-gnu
riscv_gnu_toolchain: https://github.com/riscv-collab/riscv-gnu-toolchain
riscv_gnu_toolchain_version: 13.2.0
riscv_gnu_toolchain_nightly_download_path: /releases/download/2024.02.02/riscv64-glibc-ubuntu-22.04-llvm-nightly-2024.02.02-nightly.tar.gz
riscv_gnu_toolchain_version: 15.1.0
riscv_gnu_toolchain_nightly_download_path: /releases/download/2025.08.29/riscv64-glibc-ubuntu-22.04-llvm-nightly-2025.08.29-nightly.tar.xz
strategy:
fail-fast: false
matrix:
Expand All @@ -26,8 +26,8 @@ jobs:
opts: TARGET=RISCV64_ZVL128B BINARY=64 ARCH=riscv64
qemu_cpu: rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=128,elen=64
- target: RISCV64_ZVL256B
opts: TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64
qemu_cpu: rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64
opts: TARGET=RISCV64_ZVL256B BINARY=64 ARCH=riscv64 BUILD_BFLOAT16=1 BUILD_HFLOAT16=1
qemu_cpu: rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64,zfh=true,zvfh=true,zvfbfwma=true
- target: DYNAMIC_ARCH=1
opts: TARGET=RISCV64_GENERIC BINARY=64 ARCH=riscv64 DYNAMIC_ARCH=1
qemu_cpu: rv64,g=true,c=true,v=true,vext_spec=v1.0,vlen=256,elen=64
Expand All @@ -40,10 +40,13 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install autoconf automake autotools-dev ninja-build make \
libgomp1-riscv64-cross ccache
libgomp1-riscv64-cross ccache qemu-kvm qemu-user libc6-riscv64-cross
wget ${riscv_gnu_toolchain}/${riscv_gnu_toolchain_nightly_download_path}
tar -xvf $(basename ${riscv_gnu_toolchain_nightly_download_path}) -C /opt

wget https://gist.github.com/martin-frbg/bb630e0de34978e578eeb496b1538d4e/raw/7fd8d971f327f7a517b8f5f7989479ff2b36f71f/qemu-riscv64-10.1-ubuntu24 -P /opt/riscv/bin -o riscv64-qemu
mv /opt/riscv/bin/qemu-riscv64-10.1-ubuntu24 /opt/riscv/bin/qemu-riscv64
chmod +x /opt/riscv/bin/qemu-riscv64

- name: Compilation cache
uses: actions/cache@v3
with:
Expand Down Expand Up @@ -74,7 +77,7 @@ jobs:
run: |
export PATH="/opt/riscv/bin:$PATH"
make TARGET=${{ matrix.target }} CFLAGS="-DTARGET=${{ matrix.target }}" \
CC='${triple}-gcc' \
CC='ccache clang --rtlib=compiler-rt -target ${triple} --sysroot /opt/riscv/sysroot --gcc-toolchain=/opt/riscv/lib/gcc/riscv64-unknown-linux-gnu/${riscv_gnu_toolchain_version}/' \
AR='ccache ${triple}-ar' AS='ccache ${triple}-gcc' LD='ccache ${triple}-gcc' \
RANLIB='ccache ${triple}-ranlib' \
FC='ccache ${triple}-gfortran' ${{ matrix.opts }} \
Expand All @@ -98,6 +101,8 @@ jobs:
shell: bash
run: |
export PATH="/opt/riscv/bin:$PATH"
export LD_LIBRARY_PATH=/opt/riscv/sysroot/lib
sudo ln -s /opt/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1 /lib
export QEMU_CPU=${{ matrix.qemu_cpu }}
rm -rf ./test_out
mkdir -p ./test_out
Expand Down Expand Up @@ -134,6 +139,14 @@ jobs:
wait
while IFS= read -r -d $'\0' LOG; do cat $LOG ; FAILURES=1 ; done < <(grep -lZ FAIL ./test_out/*)
if [[ ! -z $FAILURES ]]; then echo "==========" ; echo "== FAIL ==" ; echo "==========" ; echo ; exit 1 ; fi
if [ "${{matrix.target}}" == "RISCV64_ZVL256B" ]; then
qemu-riscv64 test/test_sbgemm &
qemu-riscv64 test/test_sbgemv &
qemu-riscv64 test/test_shgemm &
qemu-riscv64 test/test_shgemv &
qemu-riscv64 test/test_bgemm
fi


- name: netlib tests
shell: bash
Expand Down
Loading
Loading