Skip to content

[CodeGen] Add 2 subtarget hooks canLowerToZeroCycleReg[Move|Zeroing] #148428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: users/tomershafir/spr/main.codegen-add-2-subtarget-hooks-canlowertozerocycleregmovezeroing
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,48 @@ class LLVM_ABI TargetSubtargetInfo : public MCSubtargetInfo {
return false;
}

/// Returns true if CopyMI can be lowered to a zero cycle register move.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid adding new hooks for this? Isn't this inferable from the sched model? Plus plenty of places essentially treat copy as free anyway (e.g. isTransient)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch series creates a cooperation between the register coalescer and post RA AArch64::copyPhysReg. AArch64::copyPhysReg contains the logic that lowers to zero cycle instructions depending on subtarget features. Here, we almost replicate this logic for use at the higher level of the register coalescer to carefully check when to prevent remat. The sched model would have to depend on subtarget features, similarly (unless we try to generalize each part of the logic and combine it in the sched model itself, which can be an unnecessary complication). Currently the sched model is old and doesnt have the needed logic.

Also, this patch series targets specifically the register coalescer, and not other places where copies are considered free.

/// Otherwise, returns false.
///
/// Lowering to zero cycle register moves depend on the microarchitecture
/// for the specific architectural registers and instructions supported.
/// Thus, currently its applied after register allocation,
/// when `ExpandPostRAPseudos` pass calls `TargetInstrInfo::lowerCopy`
/// which in turn calls `TargetInstrInfo::copyPhysReg`.
///
/// Subtargets can override this method to classify lowering candidates.
/// Note that this cannot be defined in tablegen because it operates at
/// a higher level.
///
/// NOTE: Subtargets must maintain consistency between the logic here and
/// on lowering.
virtual bool canLowerToZeroCycleRegMove(const MachineInstr &CopyMI,
const Register &DestReg,
const Register &SrcReg) const {
return false;
}

/// Returns true if CopyMI can be lowered to a zero cycle register zeroing.
/// Otherwise, returns false.
///
/// Lowering to zero cycle register zeroing depends on the microarchitecture
/// for the specific architectural registers and instructions supported.
/// Thus, currently it takes place after register allocation,
/// when `ExpandPostRAPseudos` pass calls `TargetInstrInfo::lowerCopy`
/// which in turn calls `TargetInstrInfo::copyPhysReg`.
///
/// Subtargets can override this method to classify lowering candidates.
/// Note that this cannot be defined in tablegen because it operates at
/// a higher level.
///
/// NOTE: Subtargets must maintain consistency between the logic here and
/// on lowering.
virtual bool canLowerToZeroCycleRegZeroing(const MachineInstr &CopyMI,
const Register &DestReg,
const Register &SrcReg) const {
return false;
}

/// True if the subtarget should run MachineScheduler after aggressive
/// coalescing.
///
Expand Down
80 changes: 80 additions & 0 deletions llvm/lib/Target/AArch64/AArch64Subtarget.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -673,3 +673,83 @@ bool AArch64Subtarget::isX16X17Safer() const {
bool AArch64Subtarget::enableMachinePipeliner() const {
return getSchedModel().hasInstrSchedModel();
}

bool AArch64Subtarget::isRegInClass(const MachineInstr &MI, const Register &Reg,
const TargetRegisterClass *TRC) const {
if (Reg.isPhysical()) {
return TRC->contains(Reg);
}
const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
return TRC->hasSubClassEq(MRI.getRegClass(Reg));
}

/// NOTE: must maintain consistency with `AArch64InstrInfo::copyPhysReg`.
bool AArch64Subtarget::canLowerToZeroCycleRegMove(
const MachineInstr &CopyMI, const Register &DestReg,
const Register &SrcReg) const {
if (isRegInClass(CopyMI, DestReg, &AArch64::GPR32allRegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::GPR32allRegClass) &&
DestReg != AArch64::WZR) {
if (DestReg == AArch64::WSP || SrcReg == AArch64::WSP ||
SrcReg != AArch64::WZR || !hasZeroCycleZeroingGP()) {
return hasZeroCycleRegMoveGPR64() || hasZeroCycleRegMoveGPR32();
}
return false;
}

if (isRegInClass(CopyMI, DestReg, &AArch64::GPR64allRegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::GPR64allRegClass) &&
DestReg != AArch64::XZR) {
if (DestReg == AArch64::SP || SrcReg == AArch64::SP ||
SrcReg != AArch64::XZR || !hasZeroCycleZeroingGP()) {
return hasZeroCycleRegMoveGPR64();
}
return false;
}

if (isRegInClass(CopyMI, DestReg, &AArch64::FPR128RegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::FPR128RegClass)) {
return isNeonAvailable() && hasZeroCycleRegMoveFPR128();
}

if (isRegInClass(CopyMI, DestReg, &AArch64::FPR64RegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::FPR64RegClass)) {
return hasZeroCycleRegMoveFPR64();
}

if (isRegInClass(CopyMI, DestReg, &AArch64::FPR32RegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::FPR32RegClass)) {
return hasZeroCycleRegMoveFPR32() || hasZeroCycleRegMoveFPR64();
}

if (isRegInClass(CopyMI, DestReg, &AArch64::FPR16RegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::FPR16RegClass)) {
return hasZeroCycleRegMoveFPR32() || hasZeroCycleRegMoveFPR64();
}

if (isRegInClass(CopyMI, DestReg, &AArch64::FPR8RegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::FPR8RegClass)) {
return hasZeroCycleRegMoveFPR32() || hasZeroCycleRegMoveFPR64();
}

return false;
}

/// NOTE: must maintain consistency with `AArch64InstrInfo::copyPhysReg`.
bool AArch64Subtarget::canLowerToZeroCycleRegZeroing(
const MachineInstr &CopyMI, const Register &DestReg,
const Register &SrcReg) const {
if (isRegInClass(CopyMI, DestReg, &AArch64::GPR32allRegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::GPR32allRegClass) &&
DestReg != AArch64::WZR) {
return AArch64::WZR == SrcReg && hasZeroCycleZeroingGP();
}

if (isRegInClass(CopyMI, DestReg, &AArch64::GPR64allRegClass) &&
isRegInClass(CopyMI, SrcReg, &AArch64::GPR64allRegClass) &&
DestReg != AArch64::XZR) {
return AArch64::XZR == SrcReg && hasZeroCycleZeroingGP();
}

return false;
}
13 changes: 13 additions & 0 deletions llvm/lib/Target/AArch64/AArch64Subtarget.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,12 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
/// Initialize properties based on the selected processor family.
void initializeProperties(bool HasMinSize);

/// Returns true if Reg is virtual and is assigned to,
/// or is physcial and is a member of, the TRC register class.
/// Otherwise, returns false.
bool isRegInClass(const MachineInstr &MI, const Register &Reg,
const TargetRegisterClass *TRC) const;

public:
/// This constructor initializes the data members to match that
/// of the specified triple.
Expand Down Expand Up @@ -163,6 +169,13 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
bool enableMachinePipeliner() const override;
bool useDFAforSMS() const override { return false; }

bool canLowerToZeroCycleRegMove(const MachineInstr &CopyMI,
const Register &DestReg,
const Register &SrcReg) const override;
bool canLowerToZeroCycleRegZeroing(const MachineInstr &CopyMI,
const Register &DestReg,
const Register &SrcReg) const override;

/// Returns ARM processor family.
/// Avoid this function! CPU specifics should be kept local to this class
/// and preferably modeled with SubtargetFeatures or properties in
Expand Down