Description
Currently Libtask stores every variable in a TapedTask's code as a ref. This is because we must know the exact state the execution was at when a produce
statement caused us to yield control, so that we can continue with the same state on the next consume
call. However, many of these refs are in fact unnecessary: If a variable is only used between two produce
statements, we won't ever need its value again after the latter produce
. For instance, say you make a TapedTask out of
function f()
a = 1
b = 2*a
produce(b)
c = 3*b
produce(c)
return nothing
end
Currently a
, b
, and c
are all kept as refs. This means that their values will be kept in memory as long as the task exists. Maybe more importantly, it also means that every bit of IR code that accesses any of them is bloated into several statements referencing and dereferencing the corresponding refs. However, for a
this is all unnecessary, since when we continue execution after the first produce
only the value of b
matters for the rest of the function. Likewise for c
.
There are many levels of sophistication at which we could try to analyse the IR to figure out which variables need to be turned into refs and which don't, but even quite a rudimentary analysis might yield large simplifications in the IR that Libtask produces, and thus great runtime performance gains.
Tagging @willtebbutt since I mentioned this idea to him, and he thought it wasn't badly misguided.