Skip to content

Improve performance by not keeping unnecessary refs #193

Open
@mhauru

Description

@mhauru

Currently Libtask stores every variable in a TapedTask's code as a ref. This is because we must know the exact state the execution was at when a produce statement caused us to yield control, so that we can continue with the same state on the next consume call. However, many of these refs are in fact unnecessary: If a variable is only used between two produce statements, we won't ever need its value again after the latter produce. For instance, say you make a TapedTask out of

function f()
    a = 1
    b = 2*a
    produce(b)
    c = 3*b
    produce(c)
    return nothing
end

Currently a, b, and c are all kept as refs. This means that their values will be kept in memory as long as the task exists. Maybe more importantly, it also means that every bit of IR code that accesses any of them is bloated into several statements referencing and dereferencing the corresponding refs. However, for a this is all unnecessary, since when we continue execution after the first produce only the value of b matters for the rest of the function. Likewise for c.

There are many levels of sophistication at which we could try to analyse the IR to figure out which variables need to be turned into refs and which don't, but even quite a rudimentary analysis might yield large simplifications in the IR that Libtask produces, and thus great runtime performance gains.

Tagging @willtebbutt since I mentioned this idea to him, and he thought it wasn't badly misguided.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions