[https://nvbugs/6412108][fix] Restore original order — `all_reduce` the routed partial first, then add the… by trtllm-agent · Pull Request #15922 · NVIDIA/TensorRT-LLM

trtllm-agent · 2026-07-03T11:37:18Z

Summary

Root cause: Sharding-IR refactor swapped the order of expert_output + shared_expert_output and all_reduce, scaling the replicated (unsharded) shared_expert output by world_size (8×) and corrupting MMLU/GSM8K outputs.
Fix: Restore original order — all_reduce the routed partial first, then add the replicated shared output — and update the comment; also removed the nvbugs/6412108 waiver line.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6412108

Summary by CodeRabbit

Bug Fixes
- Improved the handling of mixture-of-experts outputs to make the final result more stable and consistent across distributed runs.
Tests
- Removed a test waiver, so one previously skipped accuracy check will now run normally.

…replicated shared expert The shared expert (Qwen3_5MoeMLP) intentionally omits the layer_type hint on its torch_linear_simple ops and the qwen3.5_moe_400b.yaml shard_layers whitelist excludes it, so its output is already the full value on every rank. The previous single-merge-point ordering (add then all_reduce) scaled the replicated shared output by world_size and dropped MMLU from ~85% to ~0.07%. Restore the original order: all_reduce the routed partial first, then add the replicated shared output. Signed-off-by: trtllm-agent <296075020+trtllm-agent@users.noreply.github.com>

coderabbitai · 2026-07-03T11:40:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4c98e406-4168-4fda-82ff-43cf04fc61f1

📥 Commits

Reviewing files that changed from the base of the PR and between 3b23a9f and da126a4.

📒 Files selected for processing (2)

tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

📝 Walkthrough

Walkthrough

The reduction order in Qwen3_5MoeSparseMoeBlock.forward is changed so the routed expert output is all-reduced before adding the shared expert output, rather than summing first then reducing. A test waiver entry for a Qwen3.5 MoE NVFP4 accuracy test is removed.

Changes

MoE reduction fix and waiver update

Layer / File(s)	Summary
Reorder MoE output all-reduce and shared expert addition `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py`, `tests/integration/test_lists/waives.txt`	All-reduce is now applied to the routed expert output first, then the shared expert output is added; the previously waived NVFP4 accuracy test for TestQwen3_5_397B_MoE is unwaived.

Estimated code review effort: 2 (Simple) | ~10 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#15914: Both PRs modify tests/integration/test_lists/waives.txt around Qwen3_5_397B MoE NVFP4 skip/waiver entries.

Suggested reviewers: xinhe-nv, jiaganc, Superjomn

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the bug fix and matches the main code change.
Description check	✅ Passed	The description explains the root cause, fix, tests, and bug link, so it is mostly complete.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

trtllm-agent requested a review from a team as a code owner July 3, 2026 11:37

trtllm-agent requested a review from marinayanov July 3, 2026 11:37

trtllm-agent assigned suyoggupta Jul 3, 2026

github-actions Bot assigned trtllm-agent Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[https://nvbugs/6412108][fix] Restore original order — `all_reduce` the routed partial first, then add the…#15922

[https://nvbugs/6412108][fix] Restore original order — `all_reduce` the routed partial first, then add the…#15922
trtllm-agent wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6412108

trtllm-agent commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

trtllm-agent commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trtllm-agent commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading