Add aiter qknorm and rope fusion kernel #844
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
TL;DR: To enable qknorm and rope fusion, set
VLLM_ROCM_USE_AITER_TRITON_ROPE=0, and launch vLLM with--compilation-config '{"pass_config": {"enable_qk_norm_rope_fusion": "true"}}'.This PR adds qknorm and rope fusion kernel from aiter. This can be controlled through
VLLM_ROCM_USE_AITER_FUSED_QK_NORM_ROPE.This flag is defaulted to false for now, however, as vLLM has already had a native CUDA kernel for qknorm and rope fusion, and on MI300X both end-to-end and kernel test show that the performance of this aiter kernel is similar to, and even slightly worse than the vLLM counterpart.
To enable the fusion, the following compilation config needs to be set:
--compilation-config '{"pass_config": {"enable_qk_norm_rope_fusion": "true"}}'.For this fusion to work, however, aiter's triton rope needs to be disabled as the compiled triton kernel cannot be found by the pattern matcher. This PR also disables the
VLLM_ROCM_USE_AITER_TRITON_ROPEby default for aggressive ops fusion.Test Plan
Qwen/Qwen3-30B-A3B-Instruct-2507
server
benchmark
lm_eval
Test Result
Using old default w/o fusion
VLLM_ROCM_USE_AITER_TRITON_ROPE=1VLLM_ROCM_USE_AITER_FUSED_QK_NORM_ROPE=0--compilation-config '{"pass_config": {"enable_qk_norm_rope_fusion": "false"}}'Fusion with vLLM's fusion kernel (new default)
VLLM_ROCM_USE_AITER_TRITON_ROPE=0VLLM_ROCM_USE_AITER_FUSED_QK_NORM_ROPE=0--compilation-config '{"pass_config": {"enable_qk_norm_rope_fusion": "true"}}'Fusion with aiter's fusion kernel
VLLM_ROCM_USE_AITER_TRITON_ROPE=0VLLM_ROCM_USE_AITER_FUSED_QK_NORM_ROPE=1--compilation-config '{"pass_config": {"enable_qk_norm_rope_fusion": "true"}}'lm_eval w/ aiter fusion kernel
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.