Skip to content

JIT: Always compute loop iteration estimate in loop inversion if we have PGO data #116104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented May 29, 2025

Ensure loop inversion always comes up with a loop iteration estimate better than BB_LOOP_WEIGHT_SCALE if we have PGO data.

@Copilot Copilot AI review requested due to automatic review settings May 29, 2025 17:37
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 29, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the loop inversion logic to skip inverting loops that are expected to iterate only a few times, based on profile weight data.

  • Simplifies the iteration count estimation by using the likely weight of the test block and the called count.
  • Removes the previous, more complex handling of profile weights and loop entry estimation.

const bool haveProfileWeights = fgIsUsingProfileWeights();
if (haveProfileWeights)
{
loopIterations = bTest->GetTrueEdge()->getLikelyWeight() / BasicBlock::getCalledCount(this);
Copy link
Preview

Copilot AI May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a check to ensure that BasicBlock::getCalledCount(this) does not return zero before performing the division to avoid a potential division by zero error.

Suggested change
loopIterations = bTest->GetTrueEdge()->getLikelyWeight() / BasicBlock::getCalledCount(this);
const auto calledCount = BasicBlock::getCalledCount(this);
if (calledCount == 0)
{
JITDUMP("BasicBlock::getCalledCount returned zero, skipping division.\n");
return false;
}
loopIterations = bTest->GetTrueEdge()->getLikelyWeight() / calledCount;

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BasicBlock::getCalledCount is guaranteed to not return zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old computation was loop-entry relative, and I think that is the right think of the iteration count.

Eg for (int i = 0; i < 11; i++)'s iteration count should always be 11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this to the previous computation (should be a no-diff change now), but out of curiosity, why shouldn't we give nested loops additional weight?

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented May 29, 2025

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show large size decreases (with libraries_tests being the outlier), as well as some size increases from RBO being pessimized by less branch duplication. I'm not sure what the cutoff for inversion should be, so if these diffs seem too big, I can reduce it a bit.

It's worth noting that I'm not cutting off any loops when we don't have PGO data. Since inversion currently runs before optSetBlockWeights, I don't think I can make any assumptions about loop iteration counts here.

@AndyAyersMS
Copy link
Member

I think this is a tricky one to get right.

  • If a loop has low average iteration count it can still have instances with high iteration counts.
  • If a loop has low iteration counts the method with the loop may be called frequently, or the loop may be inside another loop with high iteration counts, etc.

@amanasifkhalid
Copy link
Member Author

or the loop may be inside another loop with high iteration counts

In this case, wouldn't we compute a high iteration count for the nested loop too (assuming the parent loop doesn't conditionally execute the child loop)?

I agree that this approach isn't sensitive to the other cases you mentioned. The loop inversion diffs that inspired this change didn't necessarily involve loops with low iteration counts; rather, they were loops that are more likely to fall through than loop, or otherwise weren't likely to run more than once per method call. It feels a bit crude, but I could take a safer approach and skip loops that don't iterate at least twice on average -- in other words, it has to behave like a loop on average to be inverted.

@AndyAyersMS
Copy link
Member

or the loop may be inside another loop with high iteration counts

In this case, wouldn't we compute a high iteration count for the nested loop too (assuming the parent loop doesn't conditionally execute the child loop)?

Ah, I should have looked more closely. You are computing a method-entry relative count, not a loop-entry relative count... I just assumed "iteration count" meant the latter.

So yes what you are doing would handle the nested case ok.

I'd like to see what a size-based heuristic looks like. I think that is perhaps less prone to mis-estimating importance or potential benefit from inversion (?).

@amanasifkhalid
Copy link
Member Author

I'd like to see what a size-based heuristic looks like.

I was thinking of reusing the size heuristic you added for loop cloning: If a loop is too big to likely benefit from cloning, then it's probably not tight enough to benefit from inversion. Does that seem like a reasonable starting point?

I don't think we can easily separate out the size heuristic change from #116017, since we need the loop data structures computed to easily compute the loop size. I can push a change to that PR with the size restriction and see how the diffs change.

@AndyAyersMS
Copy link
Member

I'd like to see what a size-based heuristic looks like.

I was thinking of reusing the size heuristic you added for loop cloning: If a loop is too big to likely benefit from cloning, then it's probably not tight enough to benefit from inversion. Does that seem like a reasonable starting point?

Sure, using the same size threshold seems reasonable.

@amanasifkhalid
Copy link
Member Author

Based on my trial and error with different size limits for loop inversion (comment), I think we're unlikely to pursue a loop iteration heuristic for now. I'm going to remove the heuristic portion and just make this into a refactor of the loop iteration computation, so that we're at least always doing it.

@amanasifkhalid amanasifkhalid changed the title JIT: Don't invert loops with low iterations counts JIT: Always compute loop iteration estimate in loop inversion if we have PGO data May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants