-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: Always compute loop iteration estimate in loop inversion if we have PGO data #116104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
JIT: Always compute loop iteration estimate in loop inversion if we have PGO data #116104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the loop inversion logic to skip inverting loops that are expected to iterate only a few times, based on profile weight data.
- Simplifies the iteration count estimation by using the likely weight of the test block and the called count.
- Removes the previous, more complex handling of profile weights and loop entry estimation.
src/coreclr/jit/optimizer.cpp
Outdated
const bool haveProfileWeights = fgIsUsingProfileWeights(); | ||
if (haveProfileWeights) | ||
{ | ||
loopIterations = bTest->GetTrueEdge()->getLikelyWeight() / BasicBlock::getCalledCount(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a check to ensure that BasicBlock::getCalledCount(this) does not return zero before performing the division to avoid a potential division by zero error.
loopIterations = bTest->GetTrueEdge()->getLikelyWeight() / BasicBlock::getCalledCount(this); | |
const auto calledCount = BasicBlock::getCalledCount(this); | |
if (calledCount == 0) | |
{ | |
JITDUMP("BasicBlock::getCalledCount returned zero, skipping division.\n"); | |
return false; | |
} | |
loopIterations = bTest->GetTrueEdge()->getLikelyWeight() / calledCount; |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicBlock::getCalledCount
is guaranteed to not return zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old computation was loop-entry relative, and I think that is the right think of the iteration count.
Eg for (int i = 0; i < 11; i++)
's iteration count should always be 11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted this to the previous computation (should be a no-diff change now), but out of curiosity, why shouldn't we give nested loops additional weight?
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show large size decreases (with It's worth noting that I'm not cutting off any loops when we don't have PGO data. Since inversion currently runs before |
I think this is a tricky one to get right.
|
In this case, wouldn't we compute a high iteration count for the nested loop too (assuming the parent loop doesn't conditionally execute the child loop)? I agree that this approach isn't sensitive to the other cases you mentioned. The loop inversion diffs that inspired this change didn't necessarily involve loops with low iteration counts; rather, they were loops that are more likely to fall through than loop, or otherwise weren't likely to run more than once per method call. It feels a bit crude, but I could take a safer approach and skip loops that don't iterate at least twice on average -- in other words, it has to behave like a loop on average to be inverted. |
Ah, I should have looked more closely. You are computing a method-entry relative count, not a loop-entry relative count... I just assumed "iteration count" meant the latter. So yes what you are doing would handle the nested case ok. I'd like to see what a size-based heuristic looks like. I think that is perhaps less prone to mis-estimating importance or potential benefit from inversion (?). |
I was thinking of reusing the size heuristic you added for loop cloning: If a loop is too big to likely benefit from cloning, then it's probably not tight enough to benefit from inversion. Does that seem like a reasonable starting point? I don't think we can easily separate out the size heuristic change from #116017, since we need the loop data structures computed to easily compute the loop size. I can push a change to that PR with the size restriction and see how the diffs change. |
Sure, using the same size threshold seems reasonable. |
Based on my trial and error with different size limits for loop inversion (comment), I think we're unlikely to pursue a loop iteration heuristic for now. I'm going to remove the heuristic portion and just make this into a refactor of the loop iteration computation, so that we're at least always doing it. |
Ensure loop inversion always comes up with a loop iteration estimate better than
BB_LOOP_WEIGHT_SCALE
if we have PGO data.