Description
CPU2017/548 slowed down after a0d699a, so this change was partially reverted by d16ecad
I could reproduce it on neoverse-v1 with -Ofast -flto -mcpu=neoverse-v1 -fuse-ld=lld -mmlir -force-no-alias=false/true
.
It looks like the digits_2
function does not pass the function specialization threshold on the Latency bonus.
Without noalias:
FnSpecialization: Specialization bonus {Latency = 2424 (71%)}
With noalias:
FnSpecialization: Specialization bonus {Latency = 1098 (34%)}
FnSpecialization: No possible specializations found for _QMbrute_forcePdigits_2
The current threshold is 40%.
It looks like the big portion of the latency bonus comes from the multiple mod(row, 3)
expressions, which are not CSEd without noalias - so function specialization computes a lot of bonus for them when row
is a constant. With noalias, there is a single srem
instruction that computes mod(row, 3)
at the beginning of the function.
I will investigate more to see if the function specialization misses to account for some bonuses.