[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B #5502

MrZ20 · 2025-12-30T02:45:23Z

What this PR does / why we need it?

This PR adds online Disaggregated Prefill/Decode performance and accuracy tests for the Qwen3-235B-A22B and Qwen3-VL-235B-A22B-Instruct models to the Nightly test suite.

These test configurations simulate the deployment of massive MoE and Vision-Language models in a dual-node (32 NPU) environment, utilizing Mooncake (KVCache Transfer) technology to achieve efficient KV cache transfer between the Prefill node and the Decode node.

Test Configuration

Qwen3-235B-A22B

Model: Qwen/Qwen3-235B-A22B
Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
Architecture: Disaggregated Prefill & Decode
- Node 0 (Producer/Prefill): DP2 + TP8 + EP + FLASHCOMM1 + FUSED_MC2.
- Node 1 (Consumer/Decode): DP4 + TP4 + EP + FLASHCOMM1 + FUSED_MC2 + FULL_DECODE_ONLY.
Benchmarks:
- Performance: vllm-ascend/GSM8K-in3500-bs2800.
- Accuracy: vllm-ascend/gsm8k-lite.

Qwen3-VL-235B-A22B-Instruct

Model: Qwen/Qwen3-VL-235B-A22B-Instruct
Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
Architecture: Disaggregated Prefill & Decode
- Node 0 (Producer/Prefill): DP2 + TP8 + EP.
- Node 1 (Consumer/Decode): DP4 + TP4 + EP + FULL_DECODE_ONLY.
Benchmarks:
- Performance: vllm-ascend/textvqa-perf-1080p.
- Accuracy: vllm-ascend/textvqa-lite.

Does this PR introduce any user-facing change?

How was this patch tested?

Nightly test action on CI:
https://github.com/vllm-project/vllm-ascend/actions/runs/20734804044/job/59529925424?pr=5442

Result as following:

Qwen3-235B-A22B(52m13s)

Accuracy test

dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
gsm8k      7cd45e     accuracy  gen                      100.00

Perf test

╒══════════════════════════╤═════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤═════════════════╤══════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max            │ Median         │ P75            │ P90            │ P99             │  N   │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪═════════════════╪══════╡
│ E2EL                     │ total   │ 437430.8974 ms │ 85116.1062 ms  │ 719816.0369 ms │ 467343.1327 ms │ 523147.7618 ms │ 532042.4372 ms │ 536807.6622 ms  │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ TTFT                     │ total   │ 347313.8722 ms │ 626.5273 ms    │ 627989.6803 ms │ 376224.5384 ms │ 433057.0308 ms │ 440635.9248 ms │ 445003.5799 ms  │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ TPOT                     │ total   │ 60.1181 ms     │ 56.364 ms      │ 61.4858 ms     │ 60.2265 ms     │ 60.8376 ms     │ 61.2495 ms     │ 61.3442 ms      │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ ITL                      │ total   │ 60.095 ms      │ 0.0084 ms      │ 586.0958 ms    │ 60.0517 ms     │ 65.9529 ms     │ 74.8002 ms     │ 108.4795 ms     │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ InputTokens              │ total   │ 3654.3079      │ 3108.0         │ 4280.0         │ 3629.0         │ 3728.0         │ 3842.1         │ 4079.0          │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ OutputTokens             │ total   │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0          │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ OutputTokenThroughput    │ total   │ 4.1372 token/s │ 2.0839 token/s │ 17.623 token/s │ 3.2096 token/s │ 3.2831 token/s │ 6.1399 token/s │ 16.5253 token/s │ 2800 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧═════════════════╧══════╛
╒══════════════════════════╤═════════╤═══════════════════╕
│ Common Metric            │ Stage   │ Value             │
╞══════════════════════════╪═════════╪═══════════════════╡
│ Benchmark Duration       │ total   │ 2005953.3298 ms   │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Requests           │ total   │ 2800              │
├──────────────────────────┼─────────┼───────────────────┤
│ Failed Requests          │ total   │ 0                 │
├──────────────────────────┼─────────┼───────────────────┤
│ Success Requests         │ total   │ 2800              │
├──────────────────────────┼─────────┼───────────────────┤
│ Concurrency              │ total   │ 610.5857          │
├──────────────────────────┼─────────┼───────────────────┤
│ Max Concurrency          │ total   │ 700               │
├──────────────────────────┼─────────┼───────────────────┤
│ Request Throughput       │ total   │ 1.3958 req/s      │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Input Tokens       │ total   │ 10232062          │
├──────────────────────────┼─────────┼───────────────────┤
│ Prefill Token Throughput │ total   │ 10.5216 token/s   │
├──────────────────────────┼─────────┼───────────────────┤
│ Total generated tokens   │ total   │ 4200000           │
├──────────────────────────┼─────────┼───────────────────┤
│ Input Token Throughput   │ total   │ 5100.8475 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Output Token Throughput  │ total   │ 2093.7676 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Token Throughput   │ total   │ 7194.615 token/s  │
╘══════════════════════════╧═════════╧═══════════════════╛

Qwen3-VL-235B-A22B-Instruct(43m2s)

Accuracy test

dataset      version  metric    mode      vllm-api-stream-chat
---------  ---------  --------  ------  ----------------------
textvqa       293754  accuracy  gen                      84.14

Perf test

╒══════════════════════════╤═════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤═════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max            │ Median         │ P75            │ P90            │ P99            │  N  │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪═════╡
│ E2EL                     │ total   │ 205944.1405 ms │ 204678.8803 ms │ 207018.5243 ms │ 206032.8198 ms │ 206355.5276 ms │ 206625.7358 ms │ 206879.2058 ms │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TTFT                     │ total   │ 1312.4494 ms   │ 802.3429 ms    │ 2023.0655 ms   │ 1310.5566 ms   │ 1465.0071 ms   │ 1697.3565 ms   │ 1866.1158 ms   │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TPOT                     │ total   │ 136.5121 ms    │ 135.8727 ms    │ 136.9272 ms    │ 136.5804 ms    │ 136.7827 ms    │ 136.8897 ms    │ 136.9185 ms    │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ ITL                      │ total   │ 136.3248 ms    │ 0.0086 ms      │ 381.4863 ms    │ 136.5474 ms    │ 140.7574 ms    │ 148.6218 ms    │ 182.08 ms      │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ InputTokens              │ total   │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokens             │ total   │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokenThroughput    │ total   │ 7.2836 token/s │ 7.2457 token/s │ 7.3286 token/s │ 7.2804 token/s │ 7.2952 token/s │ 7.3113 token/s │ 7.3254 token/s │ 512 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧═════╛
╒═════════════════════════╤═════════╤══════════════════╕
│ Common Metric           │ Stage   │ Value            │
╞═════════════════════════╪═════════╪══════════════════╡
│ Benchmark Duration      │ total   │ 1652977.6847 ms  │
├─────────────────────────┼─────────┼──────────────────┤
│ Total Requests          │ total   │ 512              │
├─────────────────────────┼─────────┼──────────────────┤
│ Failed Requests         │ total   │ 0                │
├─────────────────────────┼─────────┼──────────────────┤
│ Success Requests        │ total   │ 512              │
├─────────────────────────┼─────────┼──────────────────┤
│ Concurrency             │ total   │ 63.79            │
├─────────────────────────┼─────────┼──────────────────┤
│ Max Concurrency         │ total   │ 64               │
├─────────────────────────┼─────────┼──────────────────┤
│ Request Throughput      │ total   │ 0.3097 req/s     │
├─────────────────────────┼─────────┼──────────────────┤
│ Total Input Tokens      │ total   │ 0                │
├─────────────────────────┼─────────┼──────────────────┤
│ Total generated tokens  │ total   │ 768000           │
├─────────────────────────┼─────────┼──────────────────┤
│ Input Token Throughput  │ total   │ 0.0 token/s      │
├─────────────────────────┼─────────┼──────────────────┤
│ Output Token Throughput │ total   │ 464.6161 token/s │
├─────────────────────────┼─────────┼──────────────────┤
│ Total Token Throughput  │ total   │ 464.6161 token/s │
╘═════════════════════════╧═════════╧══════════════════╛

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@45c1ca1

gemini-code-assist

Code Review

This pull request adds a nightly CI test for the Qwen3-235B-A22B model. The new YAML configuration file looks reasonable. However, the changes to the run.sh script introduce a critical issue by hardcoding a pull request number to fetch code. This is a very brittle approach that will likely break the CI in the future and should be removed before merging.

tests/e2e/nightly/multi_node/scripts/run.sh

github-actions · 2025-12-30T03:24:24Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-30T11:22:52Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2026-01-04T09:31:29Z

tests/e2e/nightly/multi_node/config/Qwen3-235B-A22B-pd.yaml

@@ -0,0 +1,111 @@
+test_name: "test Qwen3-235B-A22B pd online"


@Angazenn plz take a look at this pr, which adding test for qwen3 235b

Angazenn · 2026-01-04T09:43:59Z

please refer to three-node-a3-pd-disaggregation for launching server scripts, as this is the typical pd-disaggregation setting for Qwen3-235B-A22B

MrZ20 · 2026-01-06T03:45:01Z

please refer to three-node-a3-pd-disaggregation for launching server scripts, as this is the typical pd-disaggregation setting for Qwen3-235B-A22B

I have revised it based on the content of the document.

github-actions · 2026-01-06T08:52:08Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MrZ20 <[email protected]>

Signed-off-by: MrZ20 <[email protected]> modify config Signed-off-by: MrZ20 <[email protected]> add nightly test entrance Signed-off-by: MrZ20 <[email protected]> modify Signed-off-by: MrZ20 <[email protected]> end Signed-off-by: MrZ20 <[email protected]>

gemini-code-assist bot reviewed Dec 30, 2025

View reviewed changes

tests/e2e/nightly/multi_node/scripts/run.sh Outdated Show resolved Hide resolved

github-actions bot added ci/build module:tests labels Dec 30, 2025

github-actions bot added the merge-conflicts label Dec 30, 2025

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from ae74860 to 3334472 Compare December 31, 2025 02:19

github-actions bot removed the merge-conflicts label Dec 31, 2025

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 3334472 to 4296db8 Compare January 4, 2026 07:25

MrZ20 changed the title ~~[CI]Add nightly ci test for Qwen3-235B-A22B~~ [CI]Add Nightly PD Online Test for Qwen3-235B-A22B Jan 4, 2026

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 0a44979 to ab1d117 Compare January 4, 2026 09:19

MengqingCao reviewed Jan 4, 2026

View reviewed changes

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch 3 times, most recently from 8067153 to d25a973 Compare January 6, 2026 03:10

MrZ20 closed this Jan 6, 2026

MrZ20 reopened this Jan 6, 2026

MrZ20 changed the title ~~[CI]Add Nightly PD Online Test for Qwen3-235B-A22B~~ [CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B Jan 6, 2026

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from eaa07ce to 388aaec Compare January 6, 2026 07:01

github-actions bot added the merge-conflicts label Jan 6, 2026

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 388aaec to 3a0e746 Compare January 6, 2026 09:37

github-actions bot removed the merge-conflicts label Jan 6, 2026

MrZ20 mentioned this pull request Jan 6, 2026

[Misc]: Refactor E2E test #4897

Open

8 tasks

MrZ20 added 4 commits January 9, 2026 14:42

add nightly test

180a428

Signed-off-by: MrZ20 <[email protected]>

wait

042f75b

Signed-off-by: MrZ20 <[email protected]>

modify pr_num

e7708f8

Signed-off-by: MrZ20 <[email protected]>

start test

ea959e1

Signed-off-by: MrZ20 <[email protected]>

MrZ20 added 2 commits January 9, 2026 14:42

modify acc baseline

a6e10fb

Signed-off-by: MrZ20 <[email protected]>

revert

58982b1

Signed-off-by: MrZ20 <[email protected]> modify config Signed-off-by: MrZ20 <[email protected]> add nightly test entrance Signed-off-by: MrZ20 <[email protected]> modify Signed-off-by: MrZ20 <[email protected]> end Signed-off-by: MrZ20 <[email protected]>

MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 3a0e746 to 58982b1 Compare January 9, 2026 06:42

wangxiyuan merged commit 09b3f9d into vllm-project:main Jan 9, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B #5502

[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B #5502

MrZ20 commented Dec 30, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

MengqingCao Jan 4, 2026

Uh oh!

Angazenn commented Jan 4, 2026 •

edited

Loading

Uh oh!

MrZ20 commented Jan 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,111 @@
		test_name: "test Qwen3-235B-A22B pd online"

[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B #5502

[CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B #5502

Conversation

MrZ20 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Test Configuration

Does this PR introduce any user-facing change?

How was this patch tested?

Qwen3-235B-A22B(52m13s)

Qwen3-VL-235B-A22B-Instruct(43m2s)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

MengqingCao Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Angazenn commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrZ20 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MrZ20 commented Dec 30, 2025 •

edited

Loading

Angazenn commented Jan 4, 2026 •

edited

Loading

MrZ20 commented Jan 6, 2026 •

edited

Loading