[Kernel] Remove cumsum in groupedmatmul #987

hahazhky · 2025-05-28T08:23:44Z

What this PR does / why we need it?

remove cumsum operator in MOE to improve performance

Does this PR introduce any user-facing change?

How was this patch tested?

it should be tested on a case with mc2 operator and graph mode enabled

MengqingCao · 2025-05-28T08:50:23Z

vllm_ascend/ops/fused_moe.py

    gate_up_out_list = torch_npu.npu_grouped_matmul(
        x=[expand_x],
        weight=[w1],
        split_item=2,
-        group_list_type=0,
+        group_list_type=1,


Maybe we could add some comments to explain why group_list_type=1 is used here, e.g., to avoid the cumulative calculation of the group list.

MengqingCao · 2025-05-28T08:55:39Z

vllm_ascend/ops/fused_moe.py

+
+    if VLLM_ENABLE_FIX_ROUTE:
+        uniform_group_list = hidden_states.shape[0] * all_to_all_group_size * top_k // moe_expert_num
+        group_list = torch.Tensor([uniform_group_list] * w1.shape[0]).long().npu()


Suggested change

group_list = torch.Tensor([uniform_group_list] * w1.shape[0]).long().npu()

group_list = torch.Tensor([uniform_group_list] * w1.shape[0]).long().to(hidden_states.device)

MengqingCao · 2025-05-28T08:55:58Z

vllm_ascend/ops/fused_moe.py

+        uniform_topk_list = [
+            (i + rank) % moe_expert_num for i in range(rank * step, (rank + 1) * step)
+        ]
+        topk_ids = torch.Tensor(uniform_topk_list).int().view(hidden_states.shape[0], -1).npu()


Suggested change

topk_ids = torch.Tensor(uniform_topk_list).int().view(hidden_states.shape[0], -1).npu()

topk_ids = torch.Tensor(uniform_topk_list).int().view(hidden_states.shape[0], -1).to(hidden_states.device)

MengqingCao · 2025-05-30T02:36:51Z

overall lgtm now, thanks!

hahazhky · 2025-05-30T07:25:11Z

@wangxiyuan @Yikun @ganyi1996ppo hello, please review this pr, @MengqingCao has replied lgtm

ganyi1996ppo · 2025-05-30T10:09:19Z

vllm_ascend/ops/fused_moe.py

@@ -50,6 +51,14 @@ def fused_experts_with_mc2(
 ) -> torch.Tensor:
    global_bs = 0
    moe_expert_num = len(expert_map)
+
+    rank = torch.distributed.get_rank()
+    if VLLM_ENABLE_FIX_ROUTE:


Performance test under the fix route scenario seems pointless, do we really need this env variable in the repo?

ganyi1996ppo · 2025-05-30T10:14:03Z

Looks good except the fix routing part, have we really need to add this? Though the performance may gets better with this opened, but this performance data means nothing in the real production environment right? @hahazhky

hahazhky · 2025-06-03T03:16:26Z

Looks good except the fix routing part, have we really need to add this? Though the performance may gets better with this opened, but this performance data means nothing in the real production environment right? @hahazhky

yes，we cannot open this in real production environment，but we can use it as a performance debug option to check the upper limit，and with the training continues，time-cost will be close to this performance

github-actions · 2025-06-03T09:40:06Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Yikun · 2025-06-04T08:57:10Z

vllm_ascend/envs.py

+    "VLLM_ENABLE_FIX_ROUTE":
+    lambda: bool(int(os.getenv("VLLM_ENABLE_FIX_ROUTE", '0'))),


Suggested change

"VLLM_ENABLE_FIX_ROUTE":

lambda: bool(int(os.getenv("VLLM_ENABLE_FIX_ROUTE", '0'))),

"VLLM_ASCEND_ENABLE_FIX_ROUTE":

lambda: bool(int(os.getenv("VLLM_ASCEND_ENABLE_FIX_ROUTE", '0'))),

Yikun · 2025-06-04T09:03:24Z

tests/multicard/test_offline_inference_distributed.py

@@ -61,3 +61,22 @@ def test_models_distributed_DeepSeek():
            distributed_executor_backend="mp",
    ) as vllm_model:
        vllm_model.generate_greedy(example_prompts, max_tokens)
+
+
+def test_models_distributed_fix_route_DeepSeek():


we should unset ENV after test executed: @patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE": "1"})

Suggested change

def test_models_distributed_fix_route_DeepSeek():

@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE": "1"})

def test_models_distributed_fix_route_DeepSeek():

[1] https://docs.python.org/3/library/unittest.mock.html#unittest.mock.patch.dict

Yikun · 2025-06-04T13:34:38Z

vllm_ascend/ops/fused_moe.py

@@ -50,6 +51,14 @@ def fused_experts_with_mc2(
 ) -> torch.Tensor:
    global_bs = 0
    moe_expert_num = len(expert_map)
+
+    rank = torch.distributed.get_rank()
+    if VLLM_ENABLE_FIX_ROUTE:


Please make sure VLLM_ENABLE_FIX_ROUTE is still needed or not after afc4c0c (enable_force_load_balance)

fix route is no longer needed, delete relevant code

Yikun

LGTM except inline comments

Yikun · 2025-06-05T03:40:25Z

tests/multicard/test_offline_inference_distributed.py

+            "deepseek-ai/DeepSeek-V2-Lite",
+            dtype=dtype,
+            tensor_parallel_size=8,
+            enable_expert_parallel=True,


Could this change more general: @pytest.mark.parametrize("enable_expert_parallel", [True, False])?

Yikun · 2025-06-05T03:42:19Z

tests/multicard/test_offline_inference_distributed.py

+    with VllmRunner(
+            "deepseek-ai/DeepSeek-V2-Lite",
+            dtype=dtype,
+            tensor_parallel_size=8,


CI fail due to only 4 cards avaiable but you used tp8

Signed-off-by: zhky <[email protected]>

github-actions · 2025-06-05T08:49:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions bot added module:ops module:core labels May 28, 2025

MengqingCao reviewed May 28, 2025

View reviewed changes

hahazhky force-pushed the main branch from 8d3727b to 5e9c5d5 Compare May 28, 2025 11:19

hahazhky changed the title ~~add fix routing for performance test~~ [Feature] add fix routing for performance test May 28, 2025

hahazhky changed the title ~~[Feature] add fix routing for performance test~~ [Core][Kernel] add fix routing for performance test May 28, 2025

hahazhky force-pushed the main branch from 5e9c5d5 to 197981d Compare May 28, 2025 13:19

hahazhky force-pushed the main branch from eba4fb3 to eb729ff Compare May 30, 2025 07:13

ganyi1996ppo reviewed May 30, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Jun 3, 2025

hahazhky force-pushed the main branch from eb729ff to a2afe32 Compare June 4, 2025 04:03

github-actions bot added module:tests and removed merge-conflicts labels Jun 4, 2025

hahazhky force-pushed the main branch from a2afe32 to d86d0d8 Compare June 4, 2025 04:08

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Open

76 tasks

Yikun changed the title ~~[Core][Kernel] add fix routing for performance test~~ [Core][Kernel] Remove cumsum in MOE and add fix routing ENV for performance test Jun 4, 2025

Yikun reviewed Jun 4, 2025

View reviewed changes

hahazhky force-pushed the main branch from d86d0d8 to 9b3ad20 Compare June 4, 2025 15:24

github-actions bot removed the module:core label Jun 4, 2025

hahazhky force-pushed the main branch 3 times, most recently from a00ba7b to 936189e Compare June 4, 2025 15:31

hahazhky changed the title ~~[Core][Kernel] Remove cumsum in MOE and add fix routing ENV for performance test~~ [Kernel] Remove cumsum in groupedmatmul Jun 4, 2025

Yikun reviewed Jun 5, 2025

View reviewed changes

hahazhky force-pushed the main branch 3 times, most recently from 1ccff57 to ac92d4b Compare June 5, 2025 06:21

remove redundant cumsum

2e71707

Signed-off-by: zhky <[email protected]>

github-actions bot added the merge-conflicts label Jun 5, 2025

hahazhky force-pushed the main branch 2 times, most recently from 588cd4b to db1d487 Compare June 5, 2025 11:16

github-actions bot removed the merge-conflicts label Jun 5, 2025

hahazhky force-pushed the main branch 2 times, most recently from 9a7bcaa to f057135 Compare June 5, 2025 14:18

github-actions bot removed the module:tests label Jun 5, 2025

hahazhky force-pushed the main branch from f057135 to 2e71707 Compare June 5, 2025 14:23

wangxiyuan approved these changes Jun 6, 2025

View reviewed changes

wangxiyuan added the ready read for review label Jun 6, 2025

ApsarasX approved these changes Jun 6, 2025

View reviewed changes

wangxiyuan merged commit 0b12c2a into vllm-project:main Jun 6, 2025
30 of 31 checks passed

	group_list = torch.Tensor([uniform_group_list] * w1.shape[0]).long().npu()
	group_list = torch.Tensor([uniform_group_list] * w1.shape[0]).long().to(hidden_states.device)

	topk_ids = torch.Tensor(uniform_topk_list).int().view(hidden_states.shape[0], -1).npu()
	topk_ids = torch.Tensor(uniform_topk_list).int().view(hidden_states.shape[0], -1).to(hidden_states.device)

		"VLLM_ENABLE_FIX_ROUTE":
		lambda: bool(int(os.getenv("VLLM_ENABLE_FIX_ROUTE", '0'))),

	def test_models_distributed_fix_route_DeepSeek():
	@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE": "1"})
	def test_models_distributed_fix_route_DeepSeek():

[Kernel] Remove cumsum in groupedmatmul #987

[Kernel] Remove cumsum in groupedmatmul #987

Uh oh!

Conversation

hahazhky commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented May 30, 2025

Uh oh!

hahazhky commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahazhky commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

hahazhky commented May 28, 2025 •

edited

Loading

ganyi1996ppo commented May 30, 2025 •

edited

Loading

hahazhky commented Jun 3, 2025 •

edited

Loading

Yikun Jun 4, 2025 •

edited

Loading