Skip to content

Missed optimization: passing vectors with different (compatible) target features prevents inlining #142321

Open
@tgross35

Description

@tgross35

This source:

target triple = "x86_64-unknown-linux-gnu"

define <2 x i64> @vec_entrypoint(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys) #4 {
  %r = call <2 x i64> @vec_callee1(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys)
  ret <2 x i64> %r
}

define internal <2 x i64> @vec_callee1(<2 x i64> %a, <2 x i64> %b, <2 x i64> %keys) #0 {
  %t1 = call <2 x i64> @vec_callee2(<2 x i64> %a, <2 x i64> %keys)
  ret <2 x i64> %t1
}

define internal <2 x i64> @vec_callee2(<2 x i64> %a, <2 x i64> %b) #3 {
  %r = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> %a, <2 x i64> %b, i8 0)
  ret <2 x i64> %r
}

attributes #0 = { "target-cpu"="x86-64" }
attributes #3 = { "target-cpu"="x86-64" "target-features"="+pclmul,+sse,+sse2" }
attributes #4 = { "target-cpu"="x86-64" "target-features"="+pclmul,+sse,+sse2,+sse3" }

Emits the following suboptimal code:

vec_entrypoint:
        movaps  xmm1, xmm2
        jmp     vec_callee1

vec_callee1:
        jmp     vec_callee2

vec_callee2:
        pclmulqdq       xmm0, xmm1, 0
        ret

The middle function vec_callee doesn't have any target features enabled, and that seems to break inlining. Inlining should remain possible, however, since there can be no ABI change across the functions.

Using pointers rather than a vector return type generates the expected code:

ptr_entrypoint:                         # @ptr_entrypoint
        movdqa  xmm0, xmmword ptr [rsi]
        pclmulqdq       xmm0, xmmword ptr [rcx], 0
        movdqa  xmmword ptr [rdi], xmm0
        ret

Repro: https://llvm.godbolt.org/z/96jnrh6rP
This issue was discovered at rust-lang/rust#139029

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions