-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT: Move loop inversion to after loop recognition #115850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR moves the loop inversion phase to after loop recognition, adds immediate block compaction/removal for newly altered test blocks, and triggers a DFS rebuild with fresh loop analysis when any loops were inverted.
- Add single-predecessor block compaction/removal in
optInvertWhileLoop
- Recompute the DFS tree and re-run loop finding after any loop inversions
- Relocate the
PHASE_INVERT_LOOPS
call in the compilation pipeline
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
optimizer.cpp | Inserted block compaction/removal and DFS invalidation |
compiler.cpp | Moved the loop inversion phase to a later point in compCompile |
Comments suppressed due to low confidence (1)
src/coreclr/jit/compiler.cpp:4668
- Add targeted tests that verify the new phase ordering and ensure that both block compaction and removal occur as expected after loop inversion.
DoPhase(this, PHASE_INVERT_LOOPS, &Compiler::optInvertLoops);
The diffs will be hard to parse for this, so I'm looking more at metrics. Here are some metric diffs for Base:
Diff:
We can see from the metrics that we're inverting fewer loops overall, but there are plenty of cases where we invert new loops, thus unblocking other loop opts -- in particular, we're doing a lot more cloning. Fewer loops found overall is due to loop inversion no longer introducing new cycles before loop recognition runs. PerfScore diffs are overwhelmingly negative in non-PGO collections. This might be heuristic-derived profile weights for cloned loops inflating PerfScores, and/or something else... |
Assuming the diffs are largely cloning related, it appears that extra cloning is pretty costly. It is hard to know how much of it is really beneficial. I wish we had better heuristics. |
Right, because of this, I've decided to flip my ordering and enable graph-based loop inversion with the existing phase ordering. Locally, the diffs are slightly easier to triage. Once that's in, hopefully it'll be easier to triage the diffs on this PR and see if there's anything actionable. |
Prerequisite to #113709. I expect diffs to go both ways: In some cases, loop canonicalization unlocks pattern-based loop inversion, whereas in other cases, we now recognize fewer loops due to loop inversion no longer introducing new cycles pre-canonicalization.