[https://nvbugs/6412108][fix] Restore original order — all_reduce the routed partial first, then add the…#15922
Open
trtllm-agent wants to merge 1 commit into
Open
Conversation
…replicated shared expert The shared expert (Qwen3_5MoeMLP) intentionally omits the layer_type hint on its torch_linear_simple ops and the qwen3.5_moe_400b.yaml shard_layers whitelist excludes it, so its output is already the full value on every rank. The previous single-merge-point ordering (add then all_reduce) scaled the replicated shared output by world_size and dropped MMLU from ~85% to ~0.07%. Restore the original order: all_reduce the routed partial first, then add the replicated shared output. Signed-off-by: trtllm-agent <296075020+trtllm-agent@users.noreply.github.com>
Contributor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThe reduction order in Qwen3_5MoeSparseMoeBlock.forward is changed so the routed expert output is all-reduced before adding the shared expert output, rather than summing first then reducing. A test waiver entry for a Qwen3.5 MoE NVFP4 accuracy test is removed. ChangesMoE reduction fix and waiver update
Estimated code review effort: 2 (Simple) | ~10 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
expert_output + shared_expert_outputandall_reduce, scaling the replicated (unsharded) shared_expert output by world_size (8×) and corrupting MMLU/GSM8K outputs.all_reducethe routed partial first, then add the replicated shared output — and update the comment; also removed the nvbugs/6412108 waiver line.Test plan
Links
Summary by CodeRabbit
Bug Fixes
Tests