Skip to content

Conversation

@MrZ20
Copy link
Contributor

@MrZ20 MrZ20 commented Dec 30, 2025

What this PR does / why we need it?

This PR adds online Disaggregated Prefill/Decode performance and accuracy tests for the Qwen3-235B-A22B and Qwen3-VL-235B-A22B-Instruct models to the Nightly test suite.

These test configurations simulate the deployment of massive MoE and Vision-Language models in a dual-node (32 NPU) environment, utilizing Mooncake (KVCache Transfer) technology to achieve efficient KV cache transfer between the Prefill node and the Decode node.

Test Configuration

Qwen3-235B-A22B

  • Model: Qwen/Qwen3-235B-A22B
  • Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
  • Architecture: Disaggregated Prefill & Decode
    • Node 0 (Producer/Prefill): DP2 + TP8 + EP + FLASHCOMM1 + FUSED_MC2.
    • Node 1 (Consumer/Decode): DP4 + TP4 + EP + FLASHCOMM1 + FUSED_MC2 + FULL_DECODE_ONLY.
  • Benchmarks:
    • Performance: vllm-ascend/GSM8K-in3500-bs2800.
    • Accuracy: vllm-ascend/gsm8k-lite.

Qwen3-VL-235B-A22B-Instruct

  • Model: Qwen/Qwen3-VL-235B-A22B-Instruct
  • Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
  • Architecture: Disaggregated Prefill & Decode
    • Node 0 (Producer/Prefill): DP2 + TP8 + EP.
    • Node 1 (Consumer/Decode): DP4 + TP4 + EP + FULL_DECODE_ONLY.
  • Benchmarks:
    • Performance: vllm-ascend/textvqa-perf-1080p.
    • Accuracy: vllm-ascend/textvqa-lite.

Does this PR introduce any user-facing change?

How was this patch tested?

Nightly test action on CI:
https://github.com/vllm-project/vllm-ascend/actions/runs/20734804044/job/59529925424?pr=5442

Result as following:

Qwen3-235B-A22B(52m13s)

  1. Accuracy test
dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
gsm8k      7cd45e     accuracy  gen                      100.00
  1. Perf test
╒══════════════════════════╤═════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤═════════════════╤══════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max            │ Median         │ P75            │ P90            │ P99             │  N   │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪═════════════════╪══════╡
│ E2EL                     │ total   │ 437430.8974 ms │ 85116.1062 ms  │ 719816.0369 ms │ 467343.1327 ms │ 523147.7618 ms │ 532042.4372 ms │ 536807.6622 ms  │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ TTFT                     │ total   │ 347313.8722 ms │ 626.5273 ms    │ 627989.6803 ms │ 376224.5384 ms │ 433057.0308 ms │ 440635.9248 ms │ 445003.5799 ms  │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ TPOT                     │ total   │ 60.1181 ms     │ 56.364 ms      │ 61.4858 ms     │ 60.2265 ms     │ 60.8376 ms     │ 61.2495 ms     │ 61.3442 ms      │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ ITL                      │ total   │ 60.095 ms      │ 0.0084 ms      │ 586.0958 ms    │ 60.0517 ms     │ 65.9529 ms     │ 74.8002 ms     │ 108.4795 ms     │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ InputTokens              │ total   │ 3654.3079      │ 3108.0         │ 4280.0         │ 3629.0         │ 3728.0         │ 3842.1         │ 4079.0          │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ OutputTokens             │ total   │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0          │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼──────┤
│ OutputTokenThroughput    │ total   │ 4.1372 token/s │ 2.0839 token/s │ 17.623 token/s │ 3.2096 token/s │ 3.2831 token/s │ 6.1399 token/s │ 16.5253 token/s │ 2800 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧═════════════════╧══════╛
╒══════════════════════════╤═════════╤═══════════════════╕
│ Common Metric            │ Stage   │ Value             │
╞══════════════════════════╪═════════╪═══════════════════╡
│ Benchmark Duration       │ total   │ 2005953.3298 ms   │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Requests           │ total   │ 2800              │
├──────────────────────────┼─────────┼───────────────────┤
│ Failed Requests          │ total   │ 0                 │
├──────────────────────────┼─────────┼───────────────────┤
│ Success Requests         │ total   │ 2800              │
├──────────────────────────┼─────────┼───────────────────┤
│ Concurrency              │ total   │ 610.5857          │
├──────────────────────────┼─────────┼───────────────────┤
│ Max Concurrency          │ total   │ 700               │
├──────────────────────────┼─────────┼───────────────────┤
│ Request Throughput       │ total   │ 1.3958 req/s      │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Input Tokens       │ total   │ 10232062          │
├──────────────────────────┼─────────┼───────────────────┤
│ Prefill Token Throughput │ total   │ 10.5216 token/s   │
├──────────────────────────┼─────────┼───────────────────┤
│ Total generated tokens   │ total   │ 4200000           │
├──────────────────────────┼─────────┼───────────────────┤
│ Input Token Throughput   │ total   │ 5100.8475 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Output Token Throughput  │ total   │ 2093.7676 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Token Throughput   │ total   │ 7194.615 token/s  │
╘══════════════════════════╧═════════╧═══════════════════╛

Qwen3-VL-235B-A22B-Instruct(43m2s)

  1. Accuracy test
dataset      version  metric    mode      vllm-api-stream-chat
---------  ---------  --------  ------  ----------------------
textvqa       293754  accuracy  gen                      84.14
  1. Perf test
╒══════════════════════════╤═════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤═════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max            │ Median         │ P75            │ P90            │ P99            │  N  │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪═════╡
│ E2EL                     │ total   │ 205944.1405 ms │ 204678.8803 ms │ 207018.5243 ms │ 206032.8198 ms │ 206355.5276 ms │ 206625.7358 ms │ 206879.2058 ms │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TTFT                     │ total   │ 1312.4494 ms   │ 802.3429 ms    │ 2023.0655 ms   │ 1310.5566 ms   │ 1465.0071 ms   │ 1697.3565 ms   │ 1866.1158 ms   │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TPOT                     │ total   │ 136.5121 ms    │ 135.8727 ms    │ 136.9272 ms    │ 136.5804 ms    │ 136.7827 ms    │ 136.8897 ms    │ 136.9185 ms    │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ ITL                      │ total   │ 136.3248 ms    │ 0.0086 ms      │ 381.4863 ms    │ 136.5474 ms    │ 140.7574 ms    │ 148.6218 ms    │ 182.08 ms      │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ InputTokens              │ total   │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 0.0            │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokens             │ total   │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokenThroughput    │ total   │ 7.2836 token/s │ 7.2457 token/s │ 7.3286 token/s │ 7.2804 token/s │ 7.2952 token/s │ 7.3113 token/s │ 7.3254 token/s │ 512 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧═════╛
╒═════════════════════════╤═════════╤══════════════════╕
│ Common Metric           │ Stage   │ Value            │
╞═════════════════════════╪═════════╪══════════════════╡
│ Benchmark Duration      │ total   │ 1652977.6847 ms  │
├─────────────────────────┼─────────┼──────────────────┤
│ Total Requests          │ total   │ 512              │
├─────────────────────────┼─────────┼──────────────────┤
│ Failed Requests         │ total   │ 0                │
├─────────────────────────┼─────────┼──────────────────┤
│ Success Requests        │ total   │ 512              │
├─────────────────────────┼─────────┼──────────────────┤
│ Concurrency             │ total   │ 63.79            │
├─────────────────────────┼─────────┼──────────────────┤
│ Max Concurrency         │ total   │ 64               │
├─────────────────────────┼─────────┼──────────────────┤
│ Request Throughput      │ total   │ 0.3097 req/s     │
├─────────────────────────┼─────────┼──────────────────┤
│ Total Input Tokens      │ total   │ 0                │
├─────────────────────────┼─────────┼──────────────────┤
│ Total generated tokens  │ total   │ 768000           │
├─────────────────────────┼─────────┼──────────────────┤
│ Input Token Throughput  │ total   │ 0.0 token/s      │
├─────────────────────────┼─────────┼──────────────────┤
│ Output Token Throughput │ total   │ 464.6161 token/s │
├─────────────────────────┼─────────┼──────────────────┤
│ Total Token Throughput  │ total   │ 464.6161 token/s │
╘═════════════════════════╧═════════╧══════════════════╛

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a nightly CI test for the Qwen3-235B-A22B model. The new YAML configuration file looks reasonable. However, the changes to the run.sh script introduce a critical issue by hardcoding a pull request number to fetch code. This is a very brittle approach that will likely break the CI in the future and should be removed before merging.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from ae74860 to 3334472 Compare December 31, 2025 02:19
@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 3334472 to 4296db8 Compare January 4, 2026 07:25
@MrZ20 MrZ20 changed the title [CI]Add nightly ci test for Qwen3-235B-A22B [CI]Add Nightly PD Online Test for Qwen3-235B-A22B Jan 4, 2026
@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 0a44979 to ab1d117 Compare January 4, 2026 09:19
@@ -0,0 +1,111 @@
test_name: "test Qwen3-235B-A22B pd online"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Angazenn plz take a look at this pr, which adding test for qwen3 235b

@Angazenn
Copy link
Collaborator

Angazenn commented Jan 4, 2026

please refer to three-node-a3-pd-disaggregation for launching server scripts, as this is the typical pd-disaggregation setting for Qwen3-235B-A22B

@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch 3 times, most recently from 8067153 to d25a973 Compare January 6, 2026 03:10
@MrZ20
Copy link
Contributor Author

MrZ20 commented Jan 6, 2026

please refer to three-node-a3-pd-disaggregation for launching server scripts, as this is the typical pd-disaggregation setting for Qwen3-235B-A22B

I have revised it based on the content of the document.

@MrZ20 MrZ20 closed this Jan 6, 2026
@MrZ20 MrZ20 reopened this Jan 6, 2026
@MrZ20 MrZ20 changed the title [CI]Add Nightly PD Online Test for Qwen3-235B-A22B [CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B Jan 6, 2026
@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from eaa07ce to 388aaec Compare January 6, 2026 07:01
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 388aaec to 3a0e746 Compare January 6, 2026 09:37
@MrZ20 MrZ20 mentioned this pull request Jan 6, 2026
8 tasks
MrZ20 added 4 commits January 9, 2026 14:42
Signed-off-by: MrZ20 <[email protected]>
Signed-off-by: MrZ20 <[email protected]>
Signed-off-by: MrZ20 <[email protected]>
Signed-off-by: MrZ20 <[email protected]>
MrZ20 added 2 commits January 9, 2026 14:42
Signed-off-by: MrZ20 <[email protected]>
Signed-off-by: MrZ20 <[email protected]>

modify config

Signed-off-by: MrZ20 <[email protected]>

add nightly test entrance

Signed-off-by: MrZ20 <[email protected]>

modify

Signed-off-by: MrZ20 <[email protected]>

end

Signed-off-by: MrZ20 <[email protected]>
@MrZ20 MrZ20 force-pushed the qwen3-235b-a22b-nightly branch from 3a0e746 to 58982b1 Compare January 9, 2026 06:42
@wangxiyuan wangxiyuan merged commit 09b3f9d into vllm-project:main Jan 9, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants