Skip to content

Unexpected timings lu vs precision #82

Open
@ctkelley

Description

@ctkelley

Hi, I'm playing with RecursiveFactorization and am finding that single precision factorizations are much faster than the cubic scaling would predict. Float16 seems slower than the prediction as well. I was hoping the Apple silicon and Julia support for Float16 in hardware would show better, but it is still far better than using OpenBlas LU in Float16.

I'd think that time(double) = 2 x time(single) = 4 x time(half)

would be the case for larger problems, but am not seeing that. Is there an algorithmic reason for this? Should I be setting parameters in the factorization to something other than the default values?

I ran into this while using this package for my own work.
This post is my happy story.

If you have insight or advice, I'd like that. The results come from a M2 Pro Mac and Julia 1.10-beta 1.

I ran this script (see below) to factor a random matrix of size N at three precisions and got this

julia> speed_test(2048)
Time for Float64 = 3.22886e-01
Time for Float32 = 1.70550e-02
Time for Float16 = 5.77817e-02

julia> speed_test(4096)
Time for Float64 = 3.68761e+00
Time for Float32 = 2.47510e-01
Time for Float16 = 4.73640e-01

julia> speed_test(8192)
Time for Float64 = 3.87592e+01
Time for Float32 = 3.12564e+00
Time for Float16 = 6.68709e+00

using Random
using BenchmarkTools
using RecursiveFactorization
using LinearAlgebra

function speed_test(N=2048)
Random.seed!(46071)
A=rand(N,N)

for T in [Float64, Float32, Float16]
    AT=T.(copy(A))
    tlu=@belapsed rlu($AT)
    println("Time for $T = $tlu")
end

end

function rlu(C)
    B = copy(C)
    CF = RecursiveFactorization.lu!(B, Vector{Int}(undef, size(B, 2)))
    return CF
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions