Skip to content

Eliminate redundant refcounting in the JIT #134584

Open
@Fidget-Spinner

Description

@Fidget-Spinner

Feature or enhancement

Proposal:

Thanks to Matt's work on borrowed LOAD_FAST, we can now eliminate reference counting trivially in the JIT.

Reference counting is expensive, Matt found that eliminating 90% of refcounts in LOAD_FAST meant a 2-3% speedup in the interpreter. So the speedup for the JIT should be quite a bit too.

The other problem is that reference counts block register allocation/TOS caching. As they force spills to the stack more often.

This issue has the potential to speedup up JIT benchmarks by several percent.

How to contribute.

Now that reference tracking in the JIT optimizer is in place, the first thing we need to do is to convert ops to make the decref an explicit uop.

For escaping decref ops (ie, ops that decref could run the GC), we would need to refactor them so their decrefs are eliminated via specialization of pops. For example: the following op (which is not an escaping op, but just purely for demonstration!):

macro(BINARY_OP_ADD_INT) =
_GUARD_TOS_INT + _GUARD_NOS_INT + unused/5 + _BINARY_OP_ADD_INT;

becomes

macro(BINARY_OP_ADD_INT) =
_GUARD_TOS_INT + _GUARD_NOS_INT + unused/5 + _BINARY_OP_ADD_INT + _POP_TOP_INT + _POP_TOP_INT;

Previously _BINARY_OP_ADD_INT's stack effect looked like this: (left, right -- res). The new version should look like (left, right -- res, left, right).
So for all the decref specializations, we would just need a _POP_X of their variants! This means no explosion of uop and their decref variants. We just specialize _POP_X to _POP_X_NO_DECREF in the JIT. Keeping things clean

Please hold off on working on anything that might cause a stack overflow for now, we're figuring out how to deal with that. This means all BINARY_OP and such.

These are open for contributors to take:

  • POP_TOP @Fidget-Spinner
  • STORE_FAST @Fidget-Spinner
  • _BINARY_OP_X_FLOAT @Fidget-Spinner
  • _BINARY_OP_X_INT @Fidget-Spinner
  • _BINARY_OP_X_UNICODE @corona10
  • _BINARY_OP_SUBSCR_X (except for BINARY_OP_SUBSCR_GETITEM)
  • _COMPARE_OP_X @corona10
  • _CALL_TYPE_1 @tomasr8
  • _CALL_STR_1 @Zheaoli
  • _CALL_TUPLE_1 @noamcohen97
  • _CALL_BUILTIN_O
  • _CALL_LEN @corona10
  • _CALL_ISINSTANCE @noamcohen97
  • _CALL_LIST_APPEND
  • _CALL_METHOD_DESCRIPTOR_O
  • _STORE_SUBSCR_DICT
  • _STORE_SUBSCR_LIST_INT @corona10
  • _LOAD_ATTR
  • _LOAD_ATTR_INSTANCE_VALUE
  • _LOAD_ATTR_MODULE
  • _LOAD_ATTR_WITH_HINT
  • _LOAD_ATTR_SLOT
  • _LOAD_ATTR_CLASS
  • _LOAD_ATTR_METHOD_WITH_VALUES
  • _LOAD_ATTR_METHOD_NO_DICT
  • _LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES
  • _LOAD_ATTR_NONDESCRIPTOR_NO_DICT
  • _LOAD_ATTR_METHOD_LAZY_DICT
  • _STORE_ATTR_INSTANCE_VALUE
  • _STORE_ATTR_SLOT
  • _STORE_ATTR_WITH_HINT
  • _STORE_ATTR

Once that's done, we can think about further ops.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetopic-JITtype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions