Description
Feature or enhancement
Proposal:
Thanks to Matt's work on borrowed LOAD_FAST, we can now eliminate reference counting trivially in the JIT.
Reference counting is expensive, Matt found that eliminating 90% of refcounts in LOAD_FAST meant a 2-3% speedup in the interpreter. So the speedup for the JIT should be quite a bit too.
The other problem is that reference counts block register allocation/TOS caching. As they force spills to the stack more often.
This issue has the potential to speedup up JIT benchmarks by several percent.
How to contribute.
Now that reference tracking in the JIT optimizer is in place, the first thing we need to do is to convert ops to make the decref an explicit uop.
For escaping decref ops (ie, ops that decref could run the GC), we would need to refactor them so their decrefs are eliminated via specialization of pops. For example: the following op (which is not an escaping op, but just purely for demonstration!):
macro(BINARY_OP_ADD_INT) =
_GUARD_TOS_INT + _GUARD_NOS_INT + unused/5 + _BINARY_OP_ADD_INT;
becomes
macro(BINARY_OP_ADD_INT) =
_GUARD_TOS_INT + _GUARD_NOS_INT + unused/5 + _BINARY_OP_ADD_INT + _POP_TOP_INT + _POP_TOP_INT;
Previously _BINARY_OP_ADD_INT's stack effect looked like this: (left, right -- res)
. The new version should look like (left, right -- res, left, right)
.
So for all the decref specializations, we would just need a _POP_X
of their variants! This means no explosion of uop and their decref variants. We just specialize _POP_X
to _POP_X_NO_DECREF
in the JIT. Keeping things clean
Please hold off on working on anything that might cause a stack overflow for now, we're figuring out how to deal with that. This means all BINARY_OP and such.
These are open for contributors to take:
- POP_TOP @Fidget-Spinner
- STORE_FAST @Fidget-Spinner
- _BINARY_OP_X_FLOAT @Fidget-Spinner
- _BINARY_OP_X_INT @Fidget-Spinner
- _BINARY_OP_X_UNICODE @corona10
- _BINARY_OP_SUBSCR_X (except for BINARY_OP_SUBSCR_GETITEM)
- _COMPARE_OP_X @corona10
- _CALL_TYPE_1 @tomasr8
- _CALL_STR_1 @Zheaoli
- _CALL_TUPLE_1 @noamcohen97
- _CALL_BUILTIN_O
- _CALL_LEN @corona10
- _CALL_ISINSTANCE @noamcohen97
- _CALL_LIST_APPEND
- _CALL_METHOD_DESCRIPTOR_O
- _STORE_SUBSCR_DICT
- _STORE_SUBSCR_LIST_INT @corona10
- _LOAD_ATTR
- _LOAD_ATTR_INSTANCE_VALUE
- _LOAD_ATTR_MODULE
- _LOAD_ATTR_WITH_HINT
- _LOAD_ATTR_SLOT
- _LOAD_ATTR_CLASS
- _LOAD_ATTR_METHOD_WITH_VALUES
- _LOAD_ATTR_METHOD_NO_DICT
- _LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES
- _LOAD_ATTR_NONDESCRIPTOR_NO_DICT
- _LOAD_ATTR_METHOD_LAZY_DICT
- _STORE_ATTR_INSTANCE_VALUE
- _STORE_ATTR_SLOT
- _STORE_ATTR_WITH_HINT
- _STORE_ATTR
Once that's done, we can think about further ops.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Linked PRs
- gh-134584: Decref elimination for float ops in the JIT #134588
- gh-134584: Specialize POP_TOP by reference and type in JIT #135761
- gh-134584: Eliminate redundant refcounting from _BINARY_OP_ADD_UNICODE #135817
- gh-134584: Eliminate redundant refcounting from
_CALL_TYPE_1
#135818 - gh-134584: Eliminate redundant refcounting from
_CALL_TUPLE_1
#135860 - gh-134584: Eliminate redundant refcounting from _STORE_SUBSCR_LIST_INT #135907
- gh-134584: Eliminate redundant refcounting from
_CALL_BUILTION_O
#136056 - gh-134584: Eliminate redundant refcounting from
_CALL_STR_1
#136070 - gh-134584: Eliminate redundant refcounting from
_CALL_ISINSTANCE
#136077 - gh-134584: Eliminate redundant refcounting from
_CALL_LEN
#136104