Skip to content

[clang][CodeGen] Set dead_on_return on indirect pointer arguments #148159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

antoniofrighetto
Copy link
Contributor

Let Clang emit dead_on_return attribute on indirect pointer arguments, namely, large aggregates that the ABI mandates be passed by value, but lowered to an indirect argument. Writes to such arguments are not observable by the caller after the callee returns.

This should desirably enable further MemCpyOpt/DSE optimizations.

Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.

Let Clang emit `dead_on_return` attribute on indirect pointer
arguments, namely, large aggregates that the ABI mandates be
passed by value, but lowered to an indirect argument. Writes
to such arguments are not observable by the caller after the
callee returns.

This should desirably enable further MemCpyOpt/DSE optimizations.

Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:RISC-V backend:PowerPC backend:SystemZ backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. coroutines C++20 coroutines clang:openmp OpenMP related changes to Clang labels Jul 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-systemz

Author: Antonio Frighetto (antoniofrighetto)

Changes

Let Clang emit dead_on_return attribute on indirect pointer arguments, namely, large aggregates that the ABI mandates be passed by value, but lowered to an indirect argument. Writes to such arguments are not observable by the caller after the callee returns.

This should desirably enable further MemCpyOpt/DSE optimizations.

Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.


Patch is 721.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148159.diff

107 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+4)
  • (modified) clang/test/CodeGen/64bit-swiftcall.c (+1-1)
  • (modified) clang/test/CodeGen/AArch64/byval-temp.c (+8-8)
  • (modified) clang/test/CodeGen/AArch64/pure-scalable-args-empty-union.c (+1-1)
  • (modified) clang/test/CodeGen/AArch64/pure-scalable-args.c (+21-21)
  • (modified) clang/test/CodeGen/AArch64/struct-coerce-using-ptr.cpp (+2-2)
  • (modified) clang/test/CodeGen/AArch64/sve-acle-__ARM_FEATURE_SVE_VECTOR_OPERATORS.c (+3-3)
  • (modified) clang/test/CodeGen/AArch64/sve-acle-__ARM_FEATURE_SVE_VECTOR_OPERATORS.cpp (+1-1)
  • (modified) clang/test/CodeGen/LoongArch/bitint.c (+3-3)
  • (modified) clang/test/CodeGen/PowerPC/ppc64-vector.c (+1-1)
  • (modified) clang/test/CodeGen/RISCV/riscv-abi.cpp (+4-4)
  • (modified) clang/test/CodeGen/RISCV/riscv-vector-callingconv-llvm-ir.c (+5-5)
  • (modified) clang/test/CodeGen/RISCV/riscv-vector-callingconv-llvm-ir.cpp (+5-5)
  • (modified) clang/test/CodeGen/RISCV/riscv32-abi.c (+37-37)
  • (modified) clang/test/CodeGen/RISCV/riscv32-vararg.c (+1-1)
  • (modified) clang/test/CodeGen/RISCV/riscv64-abi.c (+9-9)
  • (modified) clang/test/CodeGen/RISCV/riscv64-vararg.c (+1-1)
  • (modified) clang/test/CodeGen/SystemZ/systemz-abi-vector.c (+26-26)
  • (modified) clang/test/CodeGen/SystemZ/systemz-abi.c (+19-19)
  • (modified) clang/test/CodeGen/SystemZ/systemz-inline-asm.c (+1-1)
  • (modified) clang/test/CodeGen/X86/cx-complex-range.c (+1-1)
  • (modified) clang/test/CodeGen/X86/x86_32-arguments-win32.c (+7-7)
  • (modified) clang/test/CodeGen/X86/x86_64-arguments-win32.c (+1-1)
  • (modified) clang/test/CodeGen/aapcs64-align.cpp (+2-2)
  • (modified) clang/test/CodeGen/arm-aapcs-vfp.c (+1-1)
  • (modified) clang/test/CodeGen/arm-abi-vector.c (+3-3)
  • (modified) clang/test/CodeGen/arm-swiftcall.c (+1-1)
  • (modified) clang/test/CodeGen/arm64-abi-vector.c (+7-7)
  • (modified) clang/test/CodeGen/arm64-arguments.c (+13-13)
  • (modified) clang/test/CodeGen/arm64-microsoft-arguments.cpp (+1-1)
  • (modified) clang/test/CodeGen/armv7k-abi.c (+1-1)
  • (modified) clang/test/CodeGen/atomic-arm64.c (+1-1)
  • (modified) clang/test/CodeGen/attr-noundef.cpp (+4-3)
  • (modified) clang/test/CodeGen/cx-complex-range.c (+18-18)
  • (modified) clang/test/CodeGen/ext-int-cc.c (+22-22)
  • (modified) clang/test/CodeGen/isfpclass.c (+1-1)
  • (modified) clang/test/CodeGen/math-libcalls-tbaa-indirect-args.c (+43-43)
  • (modified) clang/test/CodeGen/mingw-long-double.c (+3-3)
  • (modified) clang/test/CodeGen/ms_abi.c (+2-2)
  • (modified) clang/test/CodeGen/pass-by-value-noalias.c (+2-2)
  • (modified) clang/test/CodeGen/ptrauth-in-c-struct.c (+2-2)
  • (modified) clang/test/CodeGen/regcall.c (+5-5)
  • (modified) clang/test/CodeGen/regcall2.c (+1-1)
  • (modified) clang/test/CodeGen/regcall4.c (+5-5)
  • (modified) clang/test/CodeGen/sparcv9-abi.c (+2-2)
  • (modified) clang/test/CodeGen/vectorcall.c (+23-23)
  • (modified) clang/test/CodeGen/win-fp128.c (+1-1)
  • (modified) clang/test/CodeGen/win64-i128.c (+2-2)
  • (modified) clang/test/CodeGen/windows-swiftcall.c (+1-1)
  • (modified) clang/test/CodeGenCXX/aarch64-mangle-sve-vectors.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/aix-alignment.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/amdgcn-func-arg.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/arm-cc.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/arm-swiftcall.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/attr-target-mv-inalloca.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/blocks.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/copy-initialization.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/cxx1z-copy-omission.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/debug-info.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/derived-to-base-conv.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/empty-nontrivially-copyable.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/fastcall.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/homogeneous-aggregates.cpp (+7-7)
  • (modified) clang/test/CodeGenCXX/inalloca-lambda.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/inalloca-overaligned.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/inalloca-vector.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/inheriting-constructor.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/member-function-pointer-calls.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/microsoft-abi-arg-order.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/microsoft-abi-byval-thunks.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/microsoft-abi-member-pointers.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/microsoft-abi-sret-and-byval.cpp (+13-13)
  • (modified) clang/test/CodeGenCXX/microsoft-abi-unknown-arch.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/ms-property.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/nrvo.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/pass-by-value-noalias.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/powerpc-byval.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/ptrauth-qualifier-struct.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/regparm.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/temporaries.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/trivial_abi.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/uncopyable-args.cpp (+16-16)
  • (modified) clang/test/CodeGenCXX/wasm-args-returns.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/windows-x86-swiftcall.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/x86_32-arguments.cpp (+2-2)
  • (modified) clang/test/CodeGenCoroutines/coro-params.cpp (+2-2)
  • (modified) clang/test/CodeGenObjC/nontrivial-c-struct-exception.m (+1-1)
  • (modified) clang/test/CodeGenObjC/pass-by-value-noalias.m (+2-2)
  • (modified) clang/test/CodeGenObjC/weak-in-c-struct.m (+3-3)
  • (modified) clang/test/CodeGenObjCXX/objc-struct-cxx-abi.mm (+9-9)
  • (modified) clang/test/CodeGenObjCXX/property-dot-copy-elision.mm (+3-3)
  • (modified) clang/test/CodeGenObjCXX/property-objects.mm (+2-2)
  • (modified) clang/test/CodeGenObjCXX/ptrauth-struct-cxx-abi.mm (+2-2)
  • (modified) clang/test/Headers/stdarg.cpp (+2-2)
  • (modified) clang/test/OpenMP/for_firstprivate_codegen.cpp (+25-25)
  • (modified) clang/test/OpenMP/parallel_firstprivate_codegen.cpp (+128-128)
  • (modified) clang/test/OpenMP/sections_firstprivate_codegen.cpp (+17-17)
  • (modified) clang/test/OpenMP/single_firstprivate_codegen.cpp (+17-17)
  • (modified) clang/test/OpenMP/target_teams_distribute_firstprivate_codegen.cpp (+61-61)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_firstprivate_codegen.cpp (+168-168)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+56-56)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp (+20-20)
  • (modified) clang/test/OpenMP/teams_distribute_firstprivate_codegen.cpp (+65-65)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_firstprivate_codegen.cpp (+90-90)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+28-28)
  • (modified) clang/test/OpenMP/teams_distribute_simd_firstprivate_codegen.cpp (+20-20)
  • (modified) clang/test/OpenMP/teams_firstprivate_codegen.cpp (+60-60)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index c8c3d6b20c496..ac5909daf8b89 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2852,8 +2852,12 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
       if (AI.getInReg())
         Attrs.addAttribute(llvm::Attribute::InReg);
 
+      // Depending on the ABI, this is either a byval or a dead_on_return
+      // argument.
       if (AI.getIndirectByVal())
         Attrs.addByValAttr(getTypes().ConvertTypeForMem(ParamType));
+      else
+        Attrs.addAttribute(llvm::Attribute::DeadOnReturn);
 
       auto *Decl = ParamType->getAsRecordDecl();
       if (CodeGenOpts.PassByValueIsNoAlias && Decl &&
diff --git a/clang/test/CodeGen/64bit-swiftcall.c b/clang/test/CodeGen/64bit-swiftcall.c
index 7f8aa02d97ce1..448bca7acbca3 100644
--- a/clang/test/CodeGen/64bit-swiftcall.c
+++ b/clang/test/CodeGen/64bit-swiftcall.c
@@ -239,7 +239,7 @@ TEST(struct_big_1)
 // CHECK-LABEL: define {{.*}} void @return_struct_big_1(ptr dead_on_unwind noalias writable sret
 
 // Should not be byval.
-// CHECK-LABEL: define {{.*}} void @take_struct_big_1(ptr{{( %.*)?}})
+// CHECK-LABEL: define {{.*}} void @take_struct_big_1(ptr dead_on_return{{( %.*)?}})
 
 /*****************************************************************************/
 /********************************* TYPE MERGING ******************************/
diff --git a/clang/test/CodeGen/AArch64/byval-temp.c b/clang/test/CodeGen/AArch64/byval-temp.c
index 0ee0312b2362d..5033b6cf5ac03 100644
--- a/clang/test/CodeGen/AArch64/byval-temp.c
+++ b/clang/test/CodeGen/AArch64/byval-temp.c
@@ -30,10 +30,10 @@ void example(void) {
 // Then, memcpy `l` to the temporary stack space.
 // CHECK-O0-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %[[byvaltemp]], ptr align 8 %[[l]], i64 64, i1 false)
 // Finally, call using a pointer to the temporary stack space.
-// CHECK-O0-NEXT: call void @pass_large(ptr noundef %[[byvaltemp]])
+// CHECK-O0-NEXT: call void @pass_large(ptr dead_on_return noundef %[[byvaltemp]])
 // Now, do the same for the second call, using the second temporary alloca.
 // CHECK-O0-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %[[byvaltemp1]], ptr align 8 %[[l]], i64 64, i1 false)
-// CHECK-O0-NEXT: call void @pass_large(ptr noundef %[[byvaltemp1]])
+// CHECK-O0-NEXT: call void @pass_large(ptr dead_on_return noundef %[[byvaltemp1]])
 // CHECK-O0-NEXT: ret void
 //
 // At O3, we should have lifetime markers to help the optimizer re-use the temporary allocas.
@@ -58,7 +58,7 @@ void example(void) {
 // Then, memcpy `l` to the temporary stack space.
 // CHECK-O3-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %[[byvaltemp]], ptr align 8 %[[l]], i64 64, i1 false)
 // Finally, call using a pointer to the temporary stack space.
-// CHECK-O3-NEXT: call void @pass_large(ptr noundef %[[byvaltemp]])
+// CHECK-O3-NEXT: call void @pass_large(ptr dead_on_return noundef %[[byvaltemp]])
 //
 // The lifetime of the temporary used to pass a pointer to the struct ends here.
 // CHECK-O3-NEXT: call void @llvm.lifetime.end.p0(i64 64, ptr %[[byvaltemp]])
@@ -66,7 +66,7 @@ void example(void) {
 // Now, do the same for the second call, using the second temporary alloca.
 // CHECK-O3-NEXT: call void @llvm.lifetime.start.p0(i64 64, ptr %[[byvaltemp1]])
 // CHECK-O3-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %[[byvaltemp1]], ptr align 8 %[[l]], i64 64, i1 false)
-// CHECK-O3-NEXT: call void @pass_large(ptr noundef %[[byvaltemp1]])
+// CHECK-O3-NEXT: call void @pass_large(ptr dead_on_return noundef %[[byvaltemp1]])
 // CHECK-O3-NEXT: call void @llvm.lifetime.end.p0(i64 64, ptr %[[byvaltemp1]])
 //
 // Mark the end of the lifetime of `l`.
@@ -88,12 +88,12 @@ void example_BitInt(void) {
 // CHECK-O0-NEXT:    [[LOADEDV:%.*]] = trunc i256 [[TMP0]] to i129
 // CHECK-O0-NEXT:    [[STOREDV:%.*]] = sext i129 [[LOADEDV]] to i256
 // CHECK-O0-NEXT:    store i256 [[STOREDV]], ptr [[INDIRECT_ARG_TEMP]], align 16
-// CHECK-O0-NEXT:    call void @pass_large_BitInt(ptr noundef [[INDIRECT_ARG_TEMP]])
+// CHECK-O0-NEXT:    call void @pass_large_BitInt(ptr dead_on_return noundef [[INDIRECT_ARG_TEMP]])
 // CHECK-O0-NEXT:    [[TMP1:%.*]] = load i256, ptr [[L]], align 16
 // CHECK-O0-NEXT:    [[LOADEDV1:%.*]] = trunc i256 [[TMP1]] to i129
 // CHECK-O0-NEXT:    [[STOREDV1:%.*]] = sext i129 [[LOADEDV1]] to i256
 // CHECK-O0-NEXT:    store i256 [[STOREDV1]], ptr [[INDIRECT_ARG_TEMP1]], align 16
-// CHECK-O0-NEXT:    call void @pass_large_BitInt(ptr noundef [[INDIRECT_ARG_TEMP1]])
+// CHECK-O0-NEXT:    call void @pass_large_BitInt(ptr dead_on_return noundef [[INDIRECT_ARG_TEMP1]])
 // CHECK-O0-NEXT:    ret void
 //
 // CHECK-O3-LABEL: define dso_local void @example_BitInt(
@@ -108,13 +108,13 @@ void example_BitInt(void) {
 // CHECK-O3-NEXT:    call void @llvm.lifetime.start.p0(i64 32, ptr [[INDIRECT_ARG_TEMP]]) 
 // CHECK-O3-NEXT:    [[STOREDV:%.*]] = sext i129 [[LOADEDV]] to i256
 // CHECK-O3-NEXT:    store i256 [[STOREDV]], ptr [[INDIRECT_ARG_TEMP]], align 16, !tbaa [[TBAA6]]
-// CHECK-O3-NEXT:    call void @pass_large_BitInt(ptr noundef [[INDIRECT_ARG_TEMP]])
+// CHECK-O3-NEXT:    call void @pass_large_BitInt(ptr dead_on_return noundef [[INDIRECT_ARG_TEMP]])
 // CHECK-O3-NEXT:    call void @llvm.lifetime.end.p0(i64 32, ptr [[INDIRECT_ARG_TEMP]]) 
 // CHECK-O3-NEXT:    [[TMP1:%.*]] = load i256, ptr [[L]], align 16, !tbaa [[TBAA6]]
 // CHECK-O3-NEXT:    [[LOADEDV1:%.*]] = trunc i256 [[TMP1]] to i129
 // CHECK-O3-NEXT:    call void @llvm.lifetime.start.p0(i64 32, ptr [[INDIRECT_ARG_TEMP1]]) 
 // CHECK-O3-NEXT:    [[STOREDV1:%.*]] = sext i129 [[LOADEDV1]] to i256
 // CHECK-O3-NEXT:    store i256 [[STOREDV1]], ptr [[INDIRECT_ARG_TEMP1]], align 16, !tbaa [[TBAA6]]
-// CHECK-O3-NEXT:    call void @pass_large_BitInt(ptr noundef [[INDIRECT_ARG_TEMP1]])
+// CHECK-O3-NEXT:    call void @pass_large_BitInt(ptr dead_on_return noundef [[INDIRECT_ARG_TEMP1]])
 // CHECK-O3-NEXT:    call void @llvm.lifetime.end.p0(i64 32, ptr [[INDIRECT_ARG_TEMP1]]) 
 // CHECK-O3-NEXT:    call void @llvm.lifetime.end.p0(i64 32, ptr [[L]]) 
diff --git a/clang/test/CodeGen/AArch64/pure-scalable-args-empty-union.c b/clang/test/CodeGen/AArch64/pure-scalable-args-empty-union.c
index 546910068c78a..804e14a2ea34b 100644
--- a/clang/test/CodeGen/AArch64/pure-scalable-args-empty-union.c
+++ b/clang/test/CodeGen/AArch64/pure-scalable-args-empty-union.c
@@ -19,7 +19,7 @@ void f0(S0 *p) {
   use0(*p);
 }
 // CHECK-C:   declare void @use0(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)
-// CHECK-CXX: declare void @use0(ptr noundef)
+// CHECK-CXX: declare void @use0(ptr dead_on_return noundef)
 
 #ifdef __cplusplus
 
diff --git a/clang/test/CodeGen/AArch64/pure-scalable-args.c b/clang/test/CodeGen/AArch64/pure-scalable-args.c
index fecd370d09be3..48988f7a1722b 100644
--- a/clang/test/CodeGen/AArch64/pure-scalable-args.c
+++ b/clang/test/CodeGen/AArch64/pure-scalable-args.c
@@ -92,7 +92,7 @@ void test_argpass_simple(PST *p) {
 // CHECK-AAPCS-NEXT: ret void
 
 // CHECK-AAPCS:  declare void @argpass_simple_callee(<vscale x 16 x i1>, <vscale x 2 x double>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @argpass_simple_callee(ptr noundef)
+// CHECK-DARWIN: declare void @argpass_simple_callee(ptr dead_on_return noundef)
 
 // Boundary case of using the last available Z-reg, PST expanded.
 //   0.0  -> d0-d3
@@ -107,7 +107,7 @@ void test_argpass_last_z(PST *p) {
     argpass_last_z_callee(.0, .0, .0, .0, *p);
 }
 // CHECK-AAPCS:  declare void @argpass_last_z_callee(double noundef, double noundef, double noundef, double noundef, <vscale x 16 x i1>, <vscale x 2 x double>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @argpass_last_z_callee(double noundef, double noundef, double noundef, double noundef, ptr noundef)
+// CHECK-DARWIN: declare void @argpass_last_z_callee(double noundef, double noundef, double noundef, double noundef, ptr dead_on_return noundef)
 
 
 // Like the above, but using a tuple type to occupy some registers.
@@ -123,7 +123,7 @@ void test_argpass_last_z_tuple(PST *p, svfloat64x4_t x) {
   argpass_last_z_tuple_callee(x, *p);
 }
 // CHECK-AAPCS:  declare void @argpass_last_z_tuple_callee(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 16 x i1>, <vscale x 2 x double>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @argpass_last_z_tuple_callee(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, ptr noundef)
+// CHECK-DARWIN: declare void @argpass_last_z_tuple_callee(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, ptr dead_on_return noundef)
 
 
 // Boundary case of using the last available P-reg, PST expanded.
@@ -139,7 +139,7 @@ void test_argpass_last_p(PST *p) {
     argpass_last_p_callee(svpfalse(), svpfalse_c(), *p);
 }
 // CHECK-AAPCS:  declare void @argpass_last_p_callee(<vscale x 16 x i1>, target("aarch64.svcount"), <vscale x 16 x i1>, <vscale x 2 x double>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @argpass_last_p_callee(<vscale x 16 x i1>, target("aarch64.svcount"), ptr noundef)
+// CHECK-DARWIN: declare void @argpass_last_p_callee(<vscale x 16 x i1>, target("aarch64.svcount"), ptr dead_on_return noundef)
 
 
 // Not enough Z-regs, push PST to memory and pass a pointer, Z-regs and
@@ -157,7 +157,7 @@ void test_argpass_no_z(PST *p, double dummy, svmfloat8_t u, int8x16_t v, mfloat8
     void argpass_no_z_callee(svmfloat8_t, int8x16_t, mfloat8x16_t, double, double, int, PST, int, double, svbool_t);
     argpass_no_z_callee(u, v, w, .0, .0, 1, *p, 2, 3.0, svptrue_b64());
 }
-// CHECK: declare void @argpass_no_z_callee(<vscale x 16 x i8>, <16 x i8> noundef, <16 x i8>, double noundef, double noundef, i32 noundef, ptr noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
+// CHECK: declare void @argpass_no_z_callee(<vscale x 16 x i8>, <16 x i8> noundef, <16 x i8>, double noundef, double noundef, i32 noundef, ptr dead_on_return noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
 
 
 // Like the above, using a tuple to occupy some registers.
@@ -173,7 +173,7 @@ void test_argpass_no_z_tuple_f64(PST *p, float dummy, svfloat64x4_t x) {
                                      double, svbool_t);
   argpass_no_z_tuple_f64_callee(x, .0, 1, *p, 2, 3.0, svptrue_b64());
 }
-// CHECK: declare void @argpass_no_z_tuple_f64_callee(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, double noundef, i32 noundef, ptr noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
+// CHECK: declare void @argpass_no_z_tuple_f64_callee(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x double>, double noundef, i32 noundef, ptr dead_on_return noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
 
 
 // Likewise, using a different tuple.
@@ -189,7 +189,7 @@ void test_argpass_no_z_tuple_mfp8(PST *p, float dummy, svmfloat8x4_t x) {
                                       double, svbool_t);
   argpass_no_z_tuple_mfp8_callee(x, .0, 1, *p, 2, 3.0, svptrue_b64());
 }
-// CHECK: declare void @argpass_no_z_tuple_mfp8_callee(<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, double noundef, i32 noundef, ptr noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
+// CHECK: declare void @argpass_no_z_tuple_mfp8_callee(<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, double noundef, i32 noundef, ptr dead_on_return noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
 
 
 // Not enough Z-regs (consumed by a HFA), PST passed indirectly
@@ -204,8 +204,8 @@ void test_argpass_no_z_hfa(HFA *h, PST *p) {
     void argpass_no_z_hfa_callee(double, HFA, int, PST, int, svbool_t);
     argpass_no_z_hfa_callee(.0, *h, 1, *p, 2, svptrue_b64());
 }
-// CHECK-AAPCS:  declare void @argpass_no_z_hfa_callee(double noundef, [4 x float] alignstack(8), i32 noundef, ptr noundef, i32 noundef, <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @argpass_no_z_hfa_callee(double noundef, [4 x float], i32 noundef, ptr noundef, i32 noundef, <vscale x 16 x i1>)
+// CHECK-AAPCS:  declare void @argpass_no_z_hfa_callee(double noundef, [4 x float] alignstack(8), i32 noundef, ptr dead_on_return noundef, i32 noundef, <vscale x 16 x i1>)
+// CHECK-DARWIN: declare void @argpass_no_z_hfa_callee(double noundef, [4 x float], i32 noundef, ptr dead_on_return noundef, i32 noundef, <vscale x 16 x i1>)
 
 // Not enough Z-regs (consumed by a HVA), PST passed indirectly
 //   0.0  -> d0
@@ -219,8 +219,8 @@ void test_argpass_no_z_hva(HVA *h, PST *p) {
     void argpass_no_z_hva_callee(double, HVA, int, PST, int, svbool_t);
     argpass_no_z_hva_callee(.0, *h, 1, *p, 2, svptrue_b64());
 }
-// CHECK-AAPCS:  declare void @argpass_no_z_hva_callee(double noundef, [4 x <16 x i8>] alignstack(16), i32 noundef, ptr noundef, i32 noundef, <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @argpass_no_z_hva_callee(double noundef, [4 x <16 x i8>], i32 noundef, ptr noundef, i32 noundef, <vscale x 16 x i1>)
+// CHECK-AAPCS:  declare void @argpass_no_z_hva_callee(double noundef, [4 x <16 x i8>] alignstack(16), i32 noundef, ptr dead_on_return noundef, i32 noundef, <vscale x 16 x i1>)
+// CHECK-DARWIN: declare void @argpass_no_z_hva_callee(double noundef, [4 x <16 x i8>], i32 noundef, ptr dead_on_return noundef, i32 noundef, <vscale x 16 x i1>)
 
 // Not enough P-regs, PST passed indirectly, Z-regs and P-regs still available.
 //   true -> p0-p2
@@ -233,7 +233,7 @@ void test_argpass_no_p(PST *p) {
     void argpass_no_p_callee(svbool_t, svbool_t, svbool_t, int, PST, int, double, svbool_t);
     argpass_no_p_callee(svptrue_b8(), svptrue_b16(), svptrue_b32(), 1, *p, 2, 3.0, svptrue_b64());
 }
-// CHECK: declare void @argpass_no_p_callee(<vscale x 16 x i1>, <vscale x 16 x i1>, <vscale x 16 x i1>, i32 noundef, ptr noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
+// CHECK: declare void @argpass_no_p_callee(<vscale x 16 x i1>, <vscale x 16 x i1>, <vscale x 16 x i1>, i32 noundef, ptr dead_on_return noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
 
 
 // Like above, using a tuple to occupy some registers.
@@ -250,7 +250,7 @@ void test_argpass_no_p_tuple(PST *p, svbool_t u, svboolx2_t v) {
                                  svbool_t);
   argpass_no_p_tuple_callee(v, u, 1, *p, 2, 3.0, svptrue_b64());
 }
-// CHECK: declare void @argpass_no_p_tuple_callee(<vscale x 16 x i1>, <vscale x 16 x i1>, <vscale x 16 x i1>, i32 noundef, ptr noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
+// CHECK: declare void @argpass_no_p_tuple_callee(<vscale x 16 x i1>, <vscale x 16 x i1>, <vscale x 16 x i1>, i32 noundef, ptr dead_on_return noundef, i32 noundef, double noundef, <vscale x 16 x i1>)
 
 
 // HFAs go back-to-back to memory, afterwards Z-regs not available, PST passed indirectly.
@@ -263,8 +263,8 @@ void test_after_hfa(HFA *h, PST *p) {
     void after_hfa_callee(double, double, double, double, double, HFA, PST, HFA, svbool_t);
     after_hfa_callee(.0, .0, .0, .0, .0, *h, *p, *h, svpfalse());
 }
-// CHECK-AAPCS:  declare void @after_hfa_callee(double noundef, double noundef, double noundef, double noundef, double noundef, [4 x float] alignstack(8), ptr noundef, [4 x float] alignstack(8), <vscale x 16 x i1>)
-// CHECK-DARWIN: declare void @after_hfa_callee(double noundef, double noundef, double noundef, double noundef, double noundef, [4 x float], ptr noundef, [4 x float], <vscale x 16 x i1>)
+// CHECK-AAPCS:  declare void @after_hfa_callee(double noundef, double noundef, double noundef, double noundef, double noundef, [4 x float] alignstack(8), ptr dead_on_return noundef, [4 x float] alignstack(8), <vscale x 16 x i1>)
+// CHECK-DARWIN: declare void @after_hfa_callee(double noundef, double noundef, double noundef, double noundef, double noundef, [4 x float], ptr dead_on_return noundef, [4 x float], <vscale x 16 x i1>)
 
 // Small PST, not enough registers, passed indirectly, unlike other small
 // aggregates.
@@ -277,7 +277,7 @@ void test_small_pst(SmallPST *p, SmallAgg *s) {
     void small_pst_callee(SmallAgg, double, double, double, double, double, double, double, double, double, SmallPST, double);
     small_pst_callee(*s, .0, .0, .0, .0, .0, .0, .0, .0, 1.0, *p, 2.0);
 }
-// CHECK-AAPCS:  declare void @small_pst_callee([2 x i64], double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, ptr noundef, double noundef)
+// CHECK-AAPCS:  declare void @small_pst_callee([2 x i64], double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, ptr dead_on_return noundef, double noundef)
 // CHECK-DARWIN: declare void @small_pst_callee([2 x i64], double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, double noundef, i128, double noundef)
 
 
@@ -326,12 +326,12 @@ void test_pass_variadic(PST *p, PST *q) {
     pass_variadic_callee(*p, *q);
 }
 // CHECK-AAPCS: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(96) %byval-temp, ptr noundef nonnull align 16 dereferenceable(96) %q, i64 96, i1 false)
-// CHECK-AAPCS: call void (<vscale x 16 x i1>, <vscale x 2 x double>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i1>, ...) @pass_variadic_callee(<vscale x 16 x i1> %1, <vscale x 2 x double> %cast.scalable1, <vscale x 4 x float> %cast.scalable2, <vscale x 4 x float> %cast.scalable3, <vscale x 16 x i8> %cast.scalable4, <vscale x 16 x i1> %12, ptr noundef nonnull %byval-temp)
+// CHECK-AAPCS: call void (<vscale x 16 x i1>, <vscale x 2 x double>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i1>, ...) @pass_variadic_callee(<vscale x 16 x i1> %1, <vscale x 2 x double> %cast.scalable1, <vscale x 4 x float> %cast.scalable2, <vscale x 4 x float> %cast.scalable3, <vscale x 16 x i8> %cast.scalable4, <vscale x 16 x i1> %12, ptr dead_on_return noundef nonnull %byval-temp)
 
 // CHECK-DARWIN: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(96) %byval-temp, ptr noundef nonnull align 16 dereferenceable(96) %p, i64 96, i1 false)
 // CHECK-DARWIN: call void @llvm.lifetime.start.p0(i64 96, ptr nonnull %byval-temp1)
 // CHECK-DARWIN: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(96) %byval-temp1, ptr noundef nonnull align 16 dereferenceable(96) %q, i64 96, i1 false)
-// CHECK-DARWIN: call void (ptr, ...) @pass_variadic_callee(ptr noundef nonnull %byval-temp, ptr noundef nonnull %byval-temp1)
+// CHECK-DARWIN: call void (ptr, ...) @pass_variadic_callee(ptr dead_on_return noundef nonnull %byval-temp, ptr dead_on_return noundef nonnull %byval-temp1)
 
 
 // Test passing a small PST, still passed indirectly, despite being <= 128 bits
@@ -340,7 +340,7 @@ void test_small_pst_variadic(SmallPST *p) {
     small_pst_variadic_callee(0, *p);
 }
 // CHECK-AAPCS: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(16) %byval-temp, ptr noundef nonnull align 16 dereferenceable(16) %p, i64 16, i1 false)
-// CHECK-AAPCS: call void (i32, ...) @small_pst_variadic_callee(i32 noundef 0, ptr noundef nonnull %byval-temp)
+// CHECK-AAPCS: call void (i32, ...) @small_pst_variadic_callee(i32 noundef 0, ptr dead_on_return noundef nonnull %byval-temp)
 
 // CHECK-DARWIN: %0 = load i128, ptr %p, align 16
 // CHECK-DARWIN: tail call void (i32, ...) @small_pst_variadic_callee(i32 noundef 0, i128 %0)
@@ -467,7 +467,7 @@ void test_tuple_reg_count(svfloat32_t x, svfloat32x2_t y) {
                                    svfloat32_t, svfloat32_t, svfloat32_t, svfloat32x2_t);
   test_tuple_reg_count_callee(x, x, x, x, x, x, x, y);
 }
-// CHECK-AAPCS: declare void @test_tuple_reg_count_callee(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, ptr noundef)
+// C...
[truncated]

@antoniofrighetto
Copy link
Contributor Author

Experiencing some ASan failures, I suspect we are adding the attribute on arguments where we shouldn't (possibly C++ destructors).

@rjmccall
Copy link
Contributor

Yeah, this is absolutely not correct if the type has a non-trivial destructor that the caller has to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:PowerPC backend:RISC-V backend:SystemZ backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category coroutines C++20 coroutines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants