-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Hello,
This is probably not a real issue for OpenBLAS but basically a request for information. Over SciPy, we have been receiving sporadic reports that, otherwise identical C translations of the old Fortran77 code was running substantially slower when thread number is not limited to 1.
scipy/scipy#22438
scipy/scipy#23161
scipy/scipy#23191
The code in question is here (not sure it matters but for reference)
https://github.com/scipy/scipy/blob/main/scipy/optimize/__lbfgsb.c
and the only BLAS/LAPACK calls made in this code are
DAXPY
DSCAL
DCOPY
DNRM2
DDOT
DPOTRF
DTRTRS
I am trying to understand which call might be being affected since I don't quite understand why OPENBLAS_NUM_THREADS=1
recovers the performance. If this is needed at all times, probably we should, on the SciPy side, include some sort of a guard since users won't even know this setting is needed for comparable performance. And since we are using these functions in other parts of SciPy it would be nice to know when we are entering into such behavior.