-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT: Graph-based loop inversion #116017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
JIT: Graph-based loop inversion #116017
Conversation
Rewrite loop inversion to be graph based and to use the new loop representation.
… from old inversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request implements graph-based loop inversion in the JIT, aiming to improve IR transformation behavior and diagnostics.
- In fgopt.cpp, the logic now returns true to indicate IR modifications in edge cases, and error handling for stmt cloning has been shifted to an assertion.
- In fgdiagnostic.cpp, a more descriptive debug message is added for loop exits with non-loop predecessors.
- In compiler.h, the API is updated to use a more specific function (optTryInvertWhileLoop) with a refined parameter type.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/coreclr/jit/fgopt.cpp | Updated control flow for IR modification detection and error handling |
src/coreclr/jit/fgdiagnostic.cpp | Enhanced loop diagnostic messaging |
src/coreclr/jit/compiler.h | Updated API for loop inversion with adjusted parameter type |
Comments suppressed due to low confidence (2)
src/coreclr/jit/fgopt.cpp:2713
- Changing the response for non-RelOp conditions from returning false to returning true alters the behavior of IR modifications. Verify that this change is intentional and that downstream code correctly handles the modified IR state.
if (!condTree->OperIsCompare())
src/coreclr/jit/compiler.h:7123
- The API change from 'optInvertWhileLoop(BasicBlock* block)' to 'optTryInvertWhileLoop(FlowGraphNaturalLoop* loop)' requires ensuring that all call sites and semantic expectations are updated accordingly.
bool optTryInvertWhileLoop(FlowGraphNaturalLoop* loop);
In this iteration, I've kept the current phase ordering. Here are some metrics from win-x64 Base:
Diff:
Diffs had large size increases locally, most of it coming from the size cost of loop inversion. |
It's worth noting that switching over to the graph-based implementation isn't enough to close the gap for array vs list iteration de-abstraction:
For this case, we need flowgraph simplification to run before loop inversion so that one of the loop condition's branches becomes an exit branch. So we're going to need #115850 at some point. |
cc @dotnet/jit-contrib, @AndyAyersMS @jakobbotsch PTAL. Diffs are still large, though it seems less to do with runaway cloning this time. Both tweaking and moving loop inversion incurs large diffs, so I'm not sure how we want to proceed, if at all... |
Looking ahead, here are the diffs on win-x64 for moving inversion past flow opts, with this PR as the baseline: Diffs are based on 2,721,962 contexts (1,064,836 MinOpts, 1,657,126 FullOpts). MISSED contexts: base: 1,259 (0.05%), diff: 1,305 (0.05%) Overall (+1,107,814 bytes)
FullOpts (+1,107,814 bytes)
Much of these size increases are driven by loop cloning (like in #115850). With cloning and unrolling disabled: Diffs are based on 2,720,931 contexts (1,064,836 MinOpts, 1,656,095 FullOpts). MISSED contexts: base: 2,297 (0.08%), diff: 2,336 (0.09%) Base JIT options: JitCloneLoops=0;JitNoUnroll=1 Diff JIT options: JitCloneLoops=0;JitNoUnroll=1 Overall (+195,450 bytes)
FullOpts (+195,450 bytes)
|
As for this PR, diffs without cloning/unrolling are mostly unchanged: Diffs are based on 2,720,993 contexts (1,064,836 MinOpts, 1,656,157 FullOpts). MISSED contexts: base: 1,341 (0.05%), diff: 2,297 (0.08%) Base JIT options: JitCloneLoops=0;JitNoUnroll=1 Diff JIT options: JitCloneLoops=0;JitNoUnroll=1 Overall (+1,214,442 bytes)
FullOpts (+1,214,442 bytes)
|
What I am unsure about is if all these inversions are doing good things or not. One thing you can do is try to quirk the new inversions away and then slowly remove the quirks one by one to have a better chance of evaluating if things are looking good. |
@AndyAyersMS here are the diffs for with/without size restrictions. It doesn't bring the size increases of this PR down by all that much, but the PerfScore improvements look promising: Diffs are based on 2,721,532 contexts (1,064,836 MinOpts, 1,656,696 FullOpts). MISSED contexts: base: 1,259 (0.05%), diff: 1,748 (0.06%) Overall (-7,948 bytes)
FullOpts (-7,948 bytes)
|
These numbers look much smaller than the numbers above. Is this just the delta vs no size restrictions? |
Can you try a few different values for the size restriction? It could be most of the loops we're currently missing tend to be smaller (eg Curious what we see at say 100, 200, 300, or maybe can you plot the distribution of sizes? |
Yes, those diffs are against graph-based loop inversion without size restrictions.
Sure, let me get something together. |
@AndyAyersMS I tried the sizes you suggested, and there seems to be a noticeable drop-off between 200 and 100 nodes (about 100KB shaved off), at least in I think we now have an aspnet collection; if you'd like, I can collect the same metrics for it. |
Sure, if it's not too much trouble. Also for the above maybe look at even smaller sizes? (say 80, 60)? |
Succeeds #113709 (GitHub won't let me reopen it). Part of #107749 and #108901. Fixes #50204.