Open
Description
This source:
target triple = "x86_64-unknown-linux-gnu"
define <2 x i64> @vec_entrypoint(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys) #4 {
%r = call <2 x i64> @vec_callee1(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys)
ret <2 x i64> %r
}
define internal <2 x i64> @vec_callee1(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys) #0 {
%t1 = call <2 x i64> @vec_callee2(<2 x i64> %a, <2 x i64> %keys)
ret <2 x i64> %t1
}
define internal <2 x i64> @vec_callee2(<2 x i64> %a, <2 x i64> %b) #3 {
%r = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> %a, <2 x i64> %b, i8 0)
ret <2 x i64> %r
}
attributes #0 = { "target-cpu"="x86-64" }
attributes #3 = { "target-cpu"="x86-64" "target-features"="+pclmul,+sse,+sse2" }
attributes #4 = { "target-cpu"="x86-64" "target-features"="+pclmul,+sse,+sse2,+sse3" }
Emits the following suboptimal code:
vec_entrypoint:
movaps xmm1, xmm2
jmp vec_callee1
vec_callee1:
jmp vec_callee2
vec_callee2:
pclmulqdq xmm0, xmm1, 0
ret
The middle function vec_callee
doesn't have any target features enabled, and that seems to break inlining. Inlining should remain possible, however, since there can be no ABI change across the functions.
Using pointers rather than a vector return type generates the expected code:
ptr_entrypoint: # @ptr_entrypoint
movdqa xmm0, xmmword ptr [rsi]
pclmulqdq xmm0, xmmword ptr [rcx], 0
movdqa xmmword ptr [rdi], xmm0
ret
Repro: https://llvm.godbolt.org/z/96jnrh6rP
This issue was discovered at rust-lang/rust#139029