[https://nvbugs/6412133][fix] Only populate `self.all_weights[self.device_id]` at construction; lazily… by trtllm-agent · Pull Request #15921 · NVIDIA/TensorRT-LLM

trtllm-agent · 2026-07-03T11:23:16Z

Summary

Root cause: _quantize_and_replicate_weights eagerly replicates NVFP4 quantized weights to every visible CUDA device (0..3), so cuda:2/3 retain stale replicas across parametrized test runs on the 4-GPU CI runner, occasionally SIGABRT-ing subsequent test setup.
Fix: Only populate self.all_weights[self.device_id] at construction; lazily materialize per-device replicas on the first get_weight_ipc_handles_serialized(device_ids=[...]) call and cache them so partial-update paths aren't slower. Also removed the waives.txt entry.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6412133

Summary by CodeRabbit

Bug Fixes
- Improved multi-GPU weight handling so quantized weights are copied only when needed, helping avoid missing-device issues during model updates.
- Fixed on-demand GPU weight availability when serializing inter-process handles, which should make multi-GPU test runs more reliable.
Chores
- Removed an outdated waived test entry.

…Handles RefNVFP4ModelWithIPCHandles._quantize_and_replicate_weights eagerly replicated all NVFP4 quantized tensors to every visible CUDA device (range(torch.cuda.device_count())). On a 4-GPU CI runner, cuda:2 and cuda:3 held a full copy of the 30B-A3B weights even though the TP=2 test only exposes cuda:0/1 via get_weight_ipc_handles_serialized([0, 1]). Those unused replicas persisted across parametrize IDs and occasionally triggered SIGABRT ("Test terminated unexpectedly") during subsequent test setup. Only populate self.device_id at construction; materialize additional replicas lazily on the first request in get_weight_ipc_handles_serialized and cache them so per-filter partial updates aren't slower. Also remove the corresponding waives.txt entry. Signed-off-by: trtllm-agent <296075020+trtllm-agent@users.noreply.github.com>

coderabbitai · 2026-07-03T11:27:43Z

📝 Walkthrough

Walkthrough

Modifies RefNVFP4ModelWithIPCHandles in the multi-GPU NVFP4 test module to store quantized weights only on the owning device instead of eagerly replicating to all GPUs, and adds on-demand cross-GPU copying when IPC handles are requested. Removes a previously skipped test waiver entry.

Changes

NVFP4 weight replication fix

Layer / File(s)	Summary
Defer and lazily replicate quantized weights `tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py`	Quantized weights are stored only on `self.device_id` during quantization; `get_weight_ipc_handles_serialized` now copies tensors to the requested device on demand when not already present before serializing IPC handles.
Remove obsolete test waiver `tests/integration/test_lists/waives.txt`	Removes the SKIP waiver entry for `test_llm_update_weights_nvfp4[fp8-Qwen3/Qwen3-30B-A3B]` previously tied to a tracked bug.

Estimated code review effort: 2 (Simple) | ~10 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Test as TestCode
  participant RefModel as RefNVFP4ModelWithIPCHandles
  participant GPUOwner as device_id GPU
  participant GPUTarget as Requested GPU

  Test->>RefModel: _quantize_and_replicate_weights()
  RefModel->>GPUOwner: store model_weights on self.device_id only

  Test->>RefModel: get_weight_ipc_handles_serialized(device)
  alt device not in self.all_weights
    RefModel->>GPUOwner: read tensors
    RefModel->>GPUTarget: copy tensors to cuda:{device}
  end
  RefModel-->>Test: serialized IPC handles

Possibly related PRs

NVIDIA/TensorRT-LLM#15898: Both PRs modify tests/integration/test_lists/waives.txt by changing waived/skipped integration test entries.
NVIDIA/TensorRT-LLM#15914: Directly overlaps by addressing the same NVFP4 multi-GPU test waiver tied to NVBUG 6412133.

Suggested reviewers: yuxianq, dongxuy04, pcastonguay

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title matches the bug fix and lazy weight population change and includes the NVBugs reference.
Description check	✅ Passed	The description explains the root cause, fix, test plan, and bug link, though it uses different headings than the template.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py (1)
423-435: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Pass explicit device_ids here — tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py:720-723 still falls back to torch.cuda.device_count(), so this path can materialize weights on every visible GPU and keep the full-replication behavior alive.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py`
around lines 423 - 435, The fallback in get_weight_ipc_handles_serialized still
uses torch.cuda.device_count(), which can replicate weights across every visible
GPU. Update this path to require and pass through the explicit device_ids from
the caller, and use that list instead of deriving all CUDA devices when building
all_weights so the serialization stays limited to the intended devices.

🧹 Nitpick comments (2)

tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py (2)
285-330: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Stale docstring: no longer "replicates across devices".

The docstring still says the method quantizes weights "and replicate[s] across devices," but per Line 330 only self.device_id is populated now; replication is deferred to get_weight_ipc_handles_serialized. Worth updating to avoid confusing future readers about where replication actually happens.
📝 Suggested docstring tweak
     def _quantize_and_replicate_weights(self):
-        """Quantize linear weights to NVFP4 and replicate across devices.
+        """Quantize linear weights to NVFP4 for the owning device.
+
+        Cross-device replicas are created lazily in
+        ``get_weight_ipc_handles_serialized`` when requested.
 
         Projections in the same fusion group (qkv, gate_up) share a unified
         weight_scale_2 computed from their joint amax, matching the behavior
         of verl's TRTLLMNVFP4QuantizerHelper.
         """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py`
around lines 285 - 330, Update the _quantize_and_replicate_weights docstring to
match the current behavior: it no longer replicates weights across devices in
this method, since only self.all_weights[self.device_id] is populated and
replication is deferred to get_weight_ipc_handles_serialized. Adjust the wording
around the method purpose and the “replicate across devices” claim so readers
are not misled.
219-230: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Consider adding a regression assertion for the lazy-replication behavior.

As per path instructions ("Act as a QA engineer reviewing test changes and coverage… suggest concrete list file names and whether coverage is sufficient"): this change fixes an intermittent SIGABRT tied to stale replicas persisting on unused devices, but there's no visible assertion that self.all_weights stays scoped to only the owning device until requested (e.g. assert list(ref_model.all_weights) == [device_id] right after __init__, and that a new key appears only after get_weight_ipc_handles_serialized([other_device])). Without it, coverage for this specific fix is insufficient — a future refactor could silently reintroduce eager replication without any test failing.

Also applies to: 423-447
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py`
around lines 219 - 230, Add a regression assertion in the multi-GPU test to
verify lazy replication stays scoped to the owning device after initialization.
In the __init__-related setup for the ref model, assert that self.all_weights
contains only the current device key immediately after construction, then call
get_weight_ipc_handles_serialized(...) for another device and assert the new key
appears only at that point. Use the existing ref_model/all_weights and
get_weight_ipc_handles_serialized symbols in the test so this behavior is
covered and a future eager-replication refactor will fail.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py`:
- Around line 423-435: The fallback in get_weight_ipc_handles_serialized still
uses torch.cuda.device_count(), which can replicate weights across every visible
GPU. Update this path to require and pass through the explicit device_ids from
the caller, and use that list instead of deriving all CUDA devices when building
all_weights so the serialization stays limited to the intended devices.

---

Nitpick comments:
In
`@tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py`:
- Around line 285-330: Update the _quantize_and_replicate_weights docstring to
match the current behavior: it no longer replicates weights across devices in
this method, since only self.all_weights[self.device_id] is populated and
replication is deferred to get_weight_ipc_handles_serialized. Adjust the wording
around the method purpose and the “replicate across devices” claim so readers
are not misled.
- Around line 219-230: Add a regression assertion in the multi-GPU test to
verify lazy replication stays scoped to the owning device after initialization.
In the __init__-related setup for the ref model, assert that self.all_weights
contains only the current device key immediately after construction, then call
get_weight_ipc_handles_serialized(...) for another device and assert the new key
appears only at that point. Use the existing ref_model/all_weights and
get_weight_ipc_handles_serialized symbols in the test so this behavior is
covered and a future eager-replication refactor will fail.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d746a5c-4a95-4a89-bfbe-b892ba23831d

📥 Commits

Reviewing files that changed from the base of the PR and between 3b23a9f and 91f177f.

📒 Files selected for processing (2)

tests/integration/test_lists/waives.txt
tests/unittest/_torch/ray_orchestrator/multi_gpu/test_llm_update_weights_multi_gpu.py

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

trtllm-agent assigned shuyixiong Jul 3, 2026

github-actions Bot assigned trtllm-agent Jul 3, 2026

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[https://nvbugs/6412133][fix] Only populate `self.all_weights[self.device_id]` at construction; lazily…#15921

[https://nvbugs/6412133][fix] Only populate `self.all_weights[self.device_id]` at construction; lazily…#15921
trtllm-agent wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6412133

trtllm-agent commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

trtllm-agent commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trtllm-agent commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading