-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
I am doing some benchmarking on 2d convolution in octave and e.g. for simple benchmark like that:
r = ones (1, 5e4);
tic; x1 = conv (r, r); time_row_conv = toc
On MacOS (M4) the timing for OpenBLAS is 3.66
s), and for APPLE veclib it is 0.1 s
.
On x86_64 linux (Ryzen 3950x) it is also a couple seconds (and pretty much the same as NETLIB).
I will try to get some other Blas on it eventually to compare.
The conv code essentially is:
const F77_INT len = ma - mb + 1; // Pre-calculate this value to avoid temporary
for (F77_INT k = 0; k < na - nb + 1; k++) {
for (F77_INT j = 0; j < nb; j++) {
for (F77_INT i = 0; i < mb; i++) {
double b_val = b[i + j*mb];
daxpy_(&len, &b_val, &a[mb-i-1 + (k+nb-j-1)*ma], &one, &c[k*len], &one);
}
}
}
and profiler shows that it all dominated by daxpy
calls.
abhishek-iitmadras
Metadata
Metadata
Assignees
Labels
No labels