Skip to content

Commit 980c240

Browse files
authored
Optimize topksoftmax WARPS_PER_TB for higher occupancy and remove redundant precision conversion (#652)
* apply clang-format * optimize --------- Co-authored-by: Cu Cui <cu.cui@alumni.uni-heidelberg.de>
1 parent 9aa4cbb commit 980c240

2 files changed

Lines changed: 357 additions & 304 deletions

File tree

aiter/fused_moe.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -911,7 +911,7 @@ def fused_topk(
911911
topk_weights,
912912
topk_ids,
913913
token_expert_indicies,
914-
gating_output.float(), # TODO(woosuk): Optimize this.
914+
gating_output,
915915
renormalize,
916916
)
917917
del token_expert_indicies # Not used. Will be used in the future.

0 commit comments

Comments
 (0)