The Kronecker product between `Diagonal` and `CUDA.CUSPARSE.AbstractCuSparseMatrix` works: ```julia using LinearAlgebra using CUDA CUDA.allowscalar(false) Ms = cu(sprand(ComplexF64, 4, 4, 0.5)) Id = I(2) kron(Id, Ms) ``` Is it possible to also support Kronecker product between `Diagonal` and (dense) `AbstractGPUArray` ? Because currently it's using scalar indexing: ```julia Md = cu(rand(ComplexF64, 4, 4)) Id = I(2) kron(Id, Md) ``` I post it here because this should also help for other GPU backends, `Metal` etc.