-
Notifications
You must be signed in to change notification settings - Fork 710
[Perf] Use fused ops npu_top_k_top_p #1308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| k: torch.Tensor, | ||
| ) -> torch.Tensor: | ||
| if p is not None and k is not None: | ||
| return torch_npu.npu_top_k_top_p(logits, p, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which torch_npu version supported this call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since rc2 b050
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update commits msg with publich torch_npu version.
such as: https://mirrors.huaweicloud.com/ascend/repos/pypi/torch-npu/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API introduced by torch_npu-2.5.1.post1.dev20250619
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
| logits: torch.Tensor, | ||
| p: torch.Tensor, | ||
| k: torch.Tensor, | ||
| p: torch.Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm....So, the order of _apply_top_k_top_p was wrong.
I suggested to keep same order with upstream https://github.com/vllm-project/vllm/blob/9a3b88328f7e434cac35b90ee463de6689f9a833/vllm/model_executor/layers/sampler.py#L398
Please change L98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
fixed |
| mock_npu_op.assert_called_once_with(logits, p, k) | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1308 +/- ##
==========================================
+ Coverage 27.39% 27.55% +0.16%
==========================================
Files 56 57 +1
Lines 6191 6238 +47
==========================================
+ Hits 1696 1719 +23
- Misses 4495 4519 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…huaweicloud.com/ascend/repos/pypi/torch-npu/ Signed-off-by: Pr0Wh1teGivee <[email protected]>
Yikun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please address comments in a new PR.
| import vllm_ascend.patch.worker.patch_common.patch_sampler | ||
| importlib.reload(vllm_ascend.patch.worker.patch_common.patch_sampler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer this [1] to patch test and make TestTopKTopPSamplerOptimize based on TestBase
| return logits | ||
|
|
||
|
|
||
| def _apply_top_k_top_p( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trend to rename this _apply_top_k_top_p to apply_top_k_top_p to avoid confused to keep same with:
https://github.com/vllm-project/vllm/blob/c53fec1fcb27aca9475e55c2d1e74c532f5f0364/vllm/v1/sample/ops/topk_topp_sampler.py#L165
### What this PR does / why we need it? Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are not None, otherwise fallback to the original one. The replacement will take place automatically when `VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1` . This patch are using `npu_top_k_top_p` which required torch_npu>=2.5.1.post1.dev20250619 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested by DeepSeek R1 and UT passed Signed-off-by: Pr0Wh1teGivee <[email protected]>
|
|
### What this PR does / why we need it? Fixed 310p failure when using the sampler feature. The root cause is: torch_npu.npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P. First PR that has the issue is #1308. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@207b750 Signed-off-by: leo-pony <[email protected]>
### What this PR does / why we need it? Fixed 310p failure when using the sampler feature. The root cause is: torch_npu.npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P. First PR that has the issue is vllm-project#1308. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@207b750 Signed-off-by: leo-pony <[email protected]>
### What this PR does / why we need it? Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are not None, otherwise fallback to the original one. The replacement will take place automatically when `VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1` . This patch are using `npu_top_k_top_p` which required torch_npu>=2.5.1.post1.dev20250619 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested by DeepSeek R1 and UT passed Signed-off-by: Pr0Wh1teGivee <[email protected]>
### What this PR does / why we need it? Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are not None, otherwise fallback to the original one. The replacement will take place automatically when `VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1` . This patch are using `npu_top_k_top_p` which required torch_npu>=2.5.1.post1.dev20250619 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested by DeepSeek R1 and UT passed Signed-off-by: Pr0Wh1teGivee <[email protected]>
### What this PR does / why we need it? Fixed 310p failure when using the sampler feature. The root cause is: torch_npu.npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P. First PR that has the issue is vllm-project#1308. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@207b750 Signed-off-by: leo-pony <[email protected]>
### What this PR does / why we need it? Fixed 310p failure when using the sampler feature. The root cause is: torch_npu.npu_top_k_top_p uses the operator aclnnApplyTopKTopP, but aclnnApplyTopKTopP currently does not support 310P. First PR that has the issue is vllm-project#1308. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@207b750 Signed-off-by: leo-pony <[email protected]>
What this PR does / why we need it?
Use fused ops torch_npu.npu_top_k_top_p(logits, p, k) when p and k are not None, otherwise fallback to the original one. The replacement will take place automatically when
VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE=1.This patch are using
npu_top_k_top_pwhich required torch_npu>=2.5.1.post1.dev20250619Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested by DeepSeek R1