JIT: Graph-based loop inversion #116017

amanasifkhalid · 2025-05-27T15:51:52Z

Succeeds #113709 (GitHub won't let me reopen it). Part of #107749 and #108901. Fixes #50204.

Rewrite loop inversion to be graph based and to use the new loop representation.

… from old inversion

…sion

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

This pull request implements graph-based loop inversion in the JIT, aiming to improve IR transformation behavior and diagnostics.

In fgopt.cpp, the logic now returns true to indicate IR modifications in edge cases, and error handling for stmt cloning has been shifted to an assertion.
In fgdiagnostic.cpp, a more descriptive debug message is added for loop exits with non-loop predecessors.
In compiler.h, the API is updated to use a more specific function (optTryInvertWhileLoop) with a refined parameter type.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
src/coreclr/jit/fgopt.cpp	Updated control flow for IR modification detection and error handling
src/coreclr/jit/fgdiagnostic.cpp	Enhanced loop diagnostic messaging
src/coreclr/jit/compiler.h	Updated API for loop inversion with adjusted parameter type

Comments suppressed due to low confidence (2)

src/coreclr/jit/fgopt.cpp:2713

Changing the response for non-RelOp conditions from returning false to returning true alters the behavior of IR modifications. Verify that this change is intentional and that downstream code correctly handles the modified IR state.

if (!condTree->OperIsCompare())

src/coreclr/jit/compiler.h:7123

The API change from 'optInvertWhileLoop(BasicBlock* block)' to 'optTryInvertWhileLoop(FlowGraphNaturalLoop* loop)' requires ensuring that all call sites and semantic expectations are updated accordingly.

bool optTryInvertWhileLoop(FlowGraphNaturalLoop* loop);

src/coreclr/jit/fgopt.cpp

amanasifkhalid · 2025-05-27T17:08:31Z

In this iteration, I've kept the current phase ordering. Here are some metrics from win-x64 benchmarks.run_pgo (we don't have aspnet collected yet):

Base:

Loops found: 79281
Loops inverted: 33526
Loops cloned: 2925
Loops unrolled: 287
Loops IV widened: 8556
Widened IVs: 8559
Unused IVs removed: 15687
Downward-counted loops: 7484
Loops strength-reduced: 6401
RBO: 90909
Jumps threaded: 21734

Diff:

Loops found: 74986 (-4835)
Loops inverted: 39581 (+6055)
Loops cloned: 3023 (+98)
Loops unrolled: 287 (+0)
Loops IV widened: 8495 (-61)
Widened IVs: 8499 (-60)
Unused IVs removed: 15647 (-40)
Downward-counted loops: 7474 (-10)
Loops strength-reduced: 6389 (-12)
RBO: 90173 (-736)
Jumps threaded: 21705 (-29)

Diffs had large size increases locally, most of it coming from the size cost of loop inversion.

amanasifkhalid · 2025-05-27T18:54:16Z

It's worth noting that switching over to the graph-based implementation isn't enough to close the gap for array vs list iteration de-abstraction:

| Method                                      | Mean     | Error   | StdDev  |
|-------------------------------------------- |---------:|--------:|--------:|
| foreach_member_array                        | 186.0 ns | 0.65 ns | 0.58 ns |
| foreach_member_array_via_local              | 185.6 ns | 0.54 ns | 0.50 ns |
| foreach_member_array_via_interface          | 184.0 ns | 0.20 ns | 0.18 ns |
| foreach_member_array_via_interface_property | 185.5 ns | 0.18 ns | 0.17 ns |
| foreach_member_list                         | 326.6 ns | 0.69 ns | 0.61 ns |
| foreach_member_list_via_interface           | 331.0 ns | 1.55 ns | 1.45 ns |

For this case, we need flowgraph simplification to run before loop inversion so that one of the loop condition's branches becomes an exit branch. So we're going to need #115850 at some point.

amanasifkhalid · 2025-05-27T18:59:22Z

cc @dotnet/jit-contrib, @AndyAyersMS @jakobbotsch PTAL. Diffs are still large, though it seems less to do with runaway cloning this time. Both tweaking and moving loop inversion incurs large diffs, so I'm not sure how we want to proceed, if at all...

amanasifkhalid · 2025-05-28T17:31:07Z

For this case, we need flowgraph simplification to run before loop inversion

Looking ahead, here are the diffs on win-x64 for moving inversion past flow opts, with this PR as the baseline:

Diffs are based on 2,721,962 contexts (1,064,836 MinOpts, 1,657,126 FullOpts).

MISSED contexts: base: 1,259 (0.05%), diff: 1,305 (0.05%)

Overall (+1,107,814 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,276,568	+4,300	+3.97%
benchmarks.run_pgo.windows.x64.checked.mch	65,200,060	+14,981	-0.28%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,715,990	+4,168	+4.76%
coreclr_tests.run.windows.x64.checked.mch	418,485,548	+663,013	+65.98%
libraries.crossgen2.windows.x64.checked.mch	38,576,226	+128,569	+21.34%
libraries.pmi.windows.x64.checked.mch	58,333,870	+17,067	-0.75%
libraries_tests.run.windows.x64.Release.mch	378,600,645	+27,051	+0.40%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	155,300,136	+237,977	+15.20%
realworld.run.windows.x64.checked.mch	11,739,437	+2,315	-4.38%
smoke_tests.nativeaot.windows.x64.checked.mch	5,086,720	+8,373	+5.64%

FullOpts (+1,107,814 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,275,866	+4,300	+3.97%
benchmarks.run_pgo.windows.x64.checked.mch	45,557,709	+14,981	-0.28%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,715,312	+4,168	+4.76%
coreclr_tests.run.windows.x64.checked.mch	129,252,446	+663,013	+65.98%
libraries.crossgen2.windows.x64.checked.mch	38,574,575	+128,569	+21.34%
libraries.pmi.windows.x64.checked.mch	58,221,085	+17,067	-0.75%
libraries_tests.run.windows.x64.Release.mch	174,050,484	+27,051	+0.40%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	144,253,898	+237,977	+15.20%
realworld.run.windows.x64.checked.mch	11,514,558	+2,315	-4.38%
smoke_tests.nativeaot.windows.x64.checked.mch	5,085,577	+8,373	+5.64%

Much of these size increases are driven by loop cloning (like in #115850). With cloning and unrolling disabled:

Diffs are based on 2,720,931 contexts (1,064,836 MinOpts, 1,656,095 FullOpts).

MISSED contexts: base: 2,297 (0.08%), diff: 2,336 (0.09%)

Base JIT options: JitCloneLoops=0;JitNoUnroll=1

Diff JIT options: JitCloneLoops=0;JitNoUnroll=1

Overall (+195,450 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,174,624	+3,509	+4.30%
benchmarks.run_pgo.windows.x64.checked.mch	64,589,923	+13,983	-0.26%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,615,950	+3,316	+5.21%
coreclr_tests.run.windows.x64.checked.mch	417,753,606	+59,262	+60.78%
libraries.crossgen2.windows.x64.checked.mch	38,491,688	+1,743	+10.48%
libraries.pmi.windows.x64.checked.mch	58,019,372	+6,301	-2.29%
libraries_tests.run.windows.x64.Release.mch	375,006,014	+22,383	+0.26%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	154,938,477	+78,238	+12.88%
realworld.run.windows.x64.checked.mch	11,662,301	+2,535	-5.00%
smoke_tests.nativeaot.windows.x64.checked.mch	5,049,163	+4,180	+0.32%

FullOpts (+195,450 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,173,922	+3,509	+4.30%
benchmarks.run_pgo.windows.x64.checked.mch	44,947,572	+13,983	-0.26%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,615,272	+3,316	+5.21%
coreclr_tests.run.windows.x64.checked.mch	128,520,504	+59,262	+60.78%
libraries.crossgen2.windows.x64.checked.mch	38,490,037	+1,743	+10.48%
libraries.pmi.windows.x64.checked.mch	57,906,587	+6,301	-2.29%
libraries_tests.run.windows.x64.Release.mch	170,455,853	+22,383	+0.26%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	143,892,239	+78,238	+12.88%
realworld.run.windows.x64.checked.mch	11,437,422	+2,535	-5.00%
smoke_tests.nativeaot.windows.x64.checked.mch	5,048,020	+4,180	+0.32%

amanasifkhalid · 2025-05-28T18:34:14Z

As for this PR, diffs without cloning/unrolling are mostly unchanged:

Diffs are based on 2,720,993 contexts (1,064,836 MinOpts, 1,656,157 FullOpts).

MISSED contexts: base: 1,341 (0.05%), diff: 2,297 (0.08%)

Base JIT options: JitCloneLoops=0;JitNoUnroll=1

Diff JIT options: JitCloneLoops=0;JitNoUnroll=1

Overall (+1,214,442 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,155,711	+19,901	+13.17%
benchmarks.run_pgo.windows.x64.checked.mch	64,284,082	+309,378	+0.41%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,597,856	+18,700	+12.80%
coreclr_tests.run.windows.x64.checked.mch	417,633,534	+120,314	+2.42%
libraries.crossgen2.windows.x64.checked.mch	38,466,716	+25,360	+11.35%
libraries.pmi.windows.x64.checked.mch	57,978,442	+45,539	+11.43%
libraries_tests.run.windows.x64.Release.mch	374,462,113	+612,497	-1.68%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	154,907,244	+37,758	+12.10%
realworld.run.windows.x64.checked.mch	11,653,598	+11,805	+14.89%
smoke_tests.nativeaot.windows.x64.checked.mch	5,035,973	+13,190	+9.75%

FullOpts (+1,214,442 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,155,009	+19,901	+13.17%
benchmarks.run_pgo.windows.x64.checked.mch	44,641,731	+309,378	+0.41%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,597,178	+18,700	+12.80%
coreclr_tests.run.windows.x64.checked.mch	128,400,432	+120,314	+2.42%
libraries.crossgen2.windows.x64.checked.mch	38,465,065	+25,360	+11.35%
libraries.pmi.windows.x64.checked.mch	57,865,657	+45,539	+11.43%
libraries_tests.run.windows.x64.Release.mch	169,911,952	+612,497	-1.68%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	143,861,006	+37,758	+12.10%
realworld.run.windows.x64.checked.mch	11,428,719	+11,805	+14.89%
smoke_tests.nativeaot.windows.x64.checked.mch	5,034,830	+13,190	+9.75%

jakobbotsch · 2025-05-28T19:40:46Z

cc @dotnet/jit-contrib, @AndyAyersMS @jakobbotsch PTAL. Diffs are still large, though it seems less to do with runaway cloning this time. Both tweaking and moving loop inversion incurs large diffs, so I'm not sure how we want to proceed, if at all...

What I am unsure about is if all these inversions are doing good things or not. One thing you can do is try to quirk the new inversions away and then slowly remove the quirks one by one to have a better chance of evaluating if things are looking good.

amanasifkhalid · 2025-05-29T23:16:03Z

@AndyAyersMS here are the diffs for with/without size restrictions. It doesn't bring the size increases of this PR down by all that much, but the PerfScore improvements look promising:

Diffs are based on 2,721,532 contexts (1,064,836 MinOpts, 1,656,696 FullOpts).

MISSED contexts: base: 1,259 (0.05%), diff: 1,748 (0.06%)

Overall (-7,948 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,261,732	-635	-2.76%
benchmarks.run_pgo.windows.x64.checked.mch	64,935,836	-19,213	-0.92%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,700,772	-207	-1.47%
coreclr_tests.run.windows.x64.checked.mch	418,314,137	+48,834	-3.52%
libraries.crossgen2.windows.x64.checked.mch	38,576,614	-5,031	-7.25%
libraries.pmi.windows.x64.checked.mch	58,182,489	-2,011	-8.17%
libraries_tests.run.windows.x64.Release.mch	377,926,977	-33,954	-0.68%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	155,122,002	-245	+1.40%
realworld.run.windows.x64.checked.mch	11,672,433	+1,776	-0.61%
smoke_tests.nativeaot.windows.x64.checked.mch	5,086,720	+2,738	-10.45%

FullOpts (-7,948 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	12,261,030	-635	-2.76%
benchmarks.run_pgo.windows.x64.checked.mch	45,293,485	-19,213	-0.92%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch	11,700,094	-207	-1.47%
coreclr_tests.run.windows.x64.checked.mch	129,081,035	+48,834	-3.52%
libraries.crossgen2.windows.x64.checked.mch	38,574,963	-5,031	-7.25%
libraries.pmi.windows.x64.checked.mch	58,069,704	-2,011	-8.17%
libraries_tests.run.windows.x64.Release.mch	173,376,816	-33,954	-0.68%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	144,075,764	-245	+1.40%
realworld.run.windows.x64.checked.mch	11,447,554	+1,776	-0.61%
smoke_tests.nativeaot.windows.x64.checked.mch	5,085,577	+2,738	-10.45%

AndyAyersMS · 2025-05-30T01:30:39Z

@AndyAyersMS here are the diffs for with/without size restrictions. It doesn't bring the size increases of this PR down by all that much, but the PerfScore improvements look promising:

These numbers look much smaller than the numbers above. Is this just the delta vs no size restrictions?

AndyAyersMS · 2025-05-30T01:32:35Z

Can you try a few different values for the size restriction? It could be most of the loops we're currently missing tend to be smaller (eg foreach)?

Curious what we see at say 100, 200, 300, or maybe can you plot the distribution of sizes?

amanasifkhalid · 2025-05-30T03:26:59Z

Is this just the delta vs no size restrictions?

Yes, those diffs are against graph-based loop inversion without size restrictions.

Curious what we see at say 100, 200, 300, or maybe can you plot the distribution of sizes?

Sure, let me get something together.

amanasifkhalid · 2025-05-30T19:38:01Z

@AndyAyersMS I tried the sizes you suggested, and there seems to be a noticeable drop-off between 200 and 100 nodes (about 100KB shaved off), at least in benchmarks.run_pgo:

I think we now have an aspnet collection; if you'd like, I can collect the same metrics for it.

AndyAyersMS · 2025-05-30T20:27:55Z

I think we now have an aspnet collection; if you'd like, I can collect the same metrics for it.

Sure, if it's not too much trouble.

Also for the above maybe look at even smaller sizes? (say 80, 60)?

jakobbotsch and others added 23 commits October 29, 2024 17:09

JIT: Make loop inversion graph based

a0e3e4d

Rewrite loop inversion to be graph based and to use the new loop representation.

Remove debug code

5df0255

Fix release build

b75aea7

Avoid inverting already-inverted loops, duplicate weight manipulation…

d75f60e

… from old inversion

Add a couple of quirks

ddc0ea2

Run jit-format

99a4321

Add a metric for loops inverted

757d14c

Merge branch 'main' of github.com:dotnet/runtime into port-loop-inver…

2c82f09

…sion

Reuse preheader for zero-trip test

e0074ff

Compact latch block if possible

1931c5a

Run jit-format

06854ab

Remove quirks

6c24c53

Merge branch 'main' of github.com:dotnet/runtime into port-loop-inver…

f59243f

…sion

Merge from main; fix profile maintenance

1437856

Leave fgRenumberBlocks in for now

b79fc7a

Simplify profile maintenance

8a9a547

Remove fgRenumberBlocks call

fe411d2

Merge branch 'main' into loop-inversion-graph-based

41a6246

Fix fgOptimizeBranch

7d1e7c8

Fix another spot in fgOptimizeBranch

0f12f5d

Merge branch 'main' into loop-inversion-graph-based

effd81e

Merge from main

af6f31c

Move phase back

d7ec36d

Copilot AI review requested due to automatic review settings May 27, 2025 15:51

dotnet-policy-service bot assigned amanasifkhalid May 27, 2025

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 27, 2025

Copilot AI reviewed May 27, 2025

View reviewed changes

Merge branch 'main' into loop-inversion-graph-based

5a4cec5

amanasifkhalid requested a review from Copilot May 27, 2025 15:54

Copilot AI reviewed May 27, 2025

View reviewed changes

src/coreclr/jit/fgopt.cpp Show resolved Hide resolved

amanasifkhalid mentioned this pull request May 29, 2025

JIT: Always compute loop iteration estimate in loop inversion if we have PGO data #116104

Open

Add size restriction

7becde0

build-analysis bot mentioned this pull request May 30, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT: Graph-based loop inversion #116017

JIT: Graph-based loop inversion #116017

amanasifkhalid commented May 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

amanasifkhalid commented May 27, 2025

Uh oh!

amanasifkhalid commented May 27, 2025

Uh oh!

amanasifkhalid commented May 27, 2025

Uh oh!

amanasifkhalid commented May 28, 2025

Uh oh!

amanasifkhalid commented May 28, 2025

Uh oh!

jakobbotsch commented May 28, 2025

Uh oh!

amanasifkhalid commented May 29, 2025

Uh oh!

AndyAyersMS commented May 30, 2025

Uh oh!

AndyAyersMS commented May 30, 2025

Uh oh!

amanasifkhalid commented May 30, 2025

Uh oh!

amanasifkhalid commented May 30, 2025 •

edited

Loading

Uh oh!

AndyAyersMS commented May 30, 2025

Uh oh!

Uh oh!

JIT: Graph-based loop inversion #116017

Are you sure you want to change the base?

JIT: Graph-based loop inversion #116017

Conversation

amanasifkhalid commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

amanasifkhalid commented May 27, 2025

Uh oh!

amanasifkhalid commented May 27, 2025

Uh oh!

amanasifkhalid commented May 27, 2025

Uh oh!

amanasifkhalid commented May 28, 2025

Uh oh!

amanasifkhalid commented May 28, 2025

Uh oh!

jakobbotsch commented May 28, 2025

Uh oh!

amanasifkhalid commented May 29, 2025

Uh oh!

AndyAyersMS commented May 30, 2025

Uh oh!

AndyAyersMS commented May 30, 2025

Uh oh!

amanasifkhalid commented May 30, 2025

Uh oh!

amanasifkhalid commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndyAyersMS commented May 30, 2025

Uh oh!

Uh oh!

amanasifkhalid commented May 27, 2025 •

edited

Loading

amanasifkhalid commented May 30, 2025 •

edited

Loading