Diagonal Matrix Mult is Slower than Dense

In trying a MWE for my core functionality, I have noticed that a simple matrix vector multiply using a Diagonal matrix is more than 5x slower than the dense matrix-vector multiply. One can of course convert this to a broadcasted vector multiply, where by I can get Reactant to within 20% of my standard CPU performance, but it would be nice to keep the same syntax and have Reactant just handle Diagonal types more seemlessly. My motivation here is that my actual production code uses custom types that are low rank representations of the matrices with overloaded definitions of matrix multiplications, that often involve diagonal components.  

```julia
using Reactant, Random, LinearAlgebra, PrettyChairmarks

function core_mul_dense(AinvVIinvtX,AinvVIinv,VAinvVIinvtX,Ainv,V,x,out)
    mul!(AinvVIinvtX,AinvVIinv',x)
    mul!(VAinvVIinvtX,V,AinvVIinvtX,-1,0)
    VAinvVIinvtX .+= x
    mul!(out,Ainv,VAinvVIinvtX)
    return
end

rng = Xoshiro(122)
n = 8000
ln = 50
x = randn(rng,n)
AinvVIinv = randn(rng,n,ln)
V = randn(rng,n,ln)
Ax = randn(rng,n)
Ainv = Diagonal(Ax)
AinvDense = Matrix(Ainv)
Ainv_vec = reshape(Ax,:,1)
AinvVIinvtX = zeros(ln)
VAinvVIinvtX = zeros(n)
out = zeros(n);

## Diagonal matrix-vector multiply
@bs core_mul_dense(AinvVIinvtX,AinvVIinv,VAinvVIinvtX,Ainv,V,x,out) seconds=3

# Chairmarks.Benchmark: 9585 samples with 1 evaluation.
#  Range (min … max):  234.913 μs …   4.769 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     288.765 μs               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   298.449 μs ± 124.596 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

## Fully dense matrix-vector multiply
@bs core_mul_dense(AinvVIinvtX,AinvVIinv,VAinvVIinvtX,AinvDense,V,x,out) seconds=10

# Chairmarks.Benchmark: 202 samples with 1 evaluation.
#  Range (min … max):  44.806 ms … 56.304 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     46.955 ms              ┊ GC (median):    0.00%
#  Time  (mean ± σ):   47.470 ms ±  1.750 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

Reactant.set_default_backend("cpu")

xR = Reactant.ConcreteRArray(x)
AinvVIinvR = Reactant.ConcreteRArray(AinvVIinv)
VR = Reactant.ConcreteRArray(V)
AinvDenseR = Reactant.ConcreteRArray(AinvDense)
AinvVIinvtXR = Reactant.ConcreteRArray(AinvVIinvtX)
VAinvVIinvtXR = Reactant.ConcreteRArray(VAinvVIinvtX)
outR = Reactant.ConcreteRArray(out);

f = @compile sync=true core_mul_dense(AinvVIinvtXR,AinvVIinvR,VAinvVIinvtXR,AinvDenseR,VR,xR,outR)

## Reactant fully dense matrix-vector multiply
@bs f(AinvVIinvtXR,AinvVIinvR,VAinvVIinvtXR,AinvDenseR,VR,xR,outR) seconds=10

# Chairmarks.Benchmark: 204 samples with 1 evaluation.
#  Range (min … max):  42.615 ms … 60.736 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     44.233 ms              ┊ GC (median):    0.00%
#  Time  (mean ± σ):   46.662 ms ±  4.344 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

## Reactant diagonal matrix-vector multiply
AinvR = Reactant.to_rarray(Ainv);

f = @compile sync=true core_mul_dense(AinvVIinvtXR,AinvVIinvR,VAinvVIinvtXR,AinvR,VR,xR,outR)

@bs f(AinvVIinvtXR,AinvVIinvR,VAinvVIinvtXR,AinvR,VR,xR,outR) seconds=10

# Chairmarks.Benchmark: 37 samples with 1 evaluation.
#  Range (min … max):  239.017 ms … 394.568 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     246.455 ms               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   273.044 ms ±  49.257 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diagonal Matrix Mult is Slower than Dense #1483

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Diagonal Matrix Mult is Slower than Dense #1483

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions