Skip to content

QUERY: Reduced performance in certain architecture only to-be-regained by OPENBLAS_NUM_THREADS=1 #5383

@ilayn

Description

@ilayn

Hello,

This is probably not a real issue for OpenBLAS but basically a request for information. Over SciPy, we have been receiving sporadic reports that, otherwise identical C translations of the old Fortran77 code was running substantially slower when thread number is not limited to 1.

scipy/scipy#22438
scipy/scipy#23161
scipy/scipy#23191

The code in question is here (not sure it matters but for reference)

https://github.com/scipy/scipy/blob/main/scipy/optimize/__lbfgsb.c

and the only BLAS/LAPACK calls made in this code are

DAXPY
DSCAL
DCOPY
DNRM2
DDOT

DPOTRF
DTRTRS

I am trying to understand which call might be being affected since I don't quite understand why OPENBLAS_NUM_THREADS=1 recovers the performance. If this is needed at all times, probably we should, on the SciPy side, include some sort of a guard since users won't even know this setting is needed for comparable performance. And since we are using these functions in other parts of SciPy it would be nice to know when we are entering into such behavior.

Metadata

Metadata

Assignees

Labels

Distribution packaging problemThird party package incompatibilities, inappropriate build flags or unmet dependencies etc

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions