[None][perf] Fuse Qwen3.5 GDN input projections by mingyangHao · Pull Request #15884 · NVIDIA/TensorRT-LLM

mingyangHao · 2026-07-02T10:10:33Z

Summary by CodeRabbit

New Features
- Added support for combined QKVZBA attention projections in Qwen3.5/Qwen3 Next paths when attention-DP is enabled.
- Improved handling of non-contiguous tensor layouts across several optimized Mamba operations.
Bug Fixes
- Fixed projection weight ordering so attention components are combined in the expected consumer order.
- Made layer normalization accept repeated weight and bias layouts for grouped inputs.
- Updated optimized kernels to work correctly with strided input views and padded storage.

Tokens	Dual-stream baseline	Combined projection	Latency reduction	Speedup
1	49.152 us	45.024 us	8.40%	1.09x
8	49.152 us	45.056 us	8.33%	1.09x
32	49.152 us	40.960 us	16.67%	1.20x
64	51.200 us	43.008 us	16.00%	1.19x
256	63.520 us	49.184 us	22.57%	1.29x
1024	174.112 us	155.616 us	10.62%	1.12x
8192	1173.504 us	1058.784 us	9.78%	1.11x

Perf for TP4:

Tokens/rank	Dual-stream baseline	Combined projection	Latency reduction	Speedup
1	22.560 us	16.416 us	27.2%	1.37x
8	24.608 us	18.464 us	25.0%	1.33x
32	24.576 us	18.176 us	26.0%	1.35x
64	24.576 us	18.464 us	24.9%	1.33x
256	28.768 us	20.480 us	28.8%	1.40x
1024	53.248 us	43.008 us	19.2%	1.24x
8192	288.864 us	256.000 us	11.4%	1.13x

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>

mingyangHao · 2026-07-02T10:11:00Z

/bot run --disable-fail-fast

coderabbitai · 2026-07-02T10:19:08Z

📝 Walkthrough

Walkthrough

This PR adds a combined in_proj_qkvzba projection path for Qwen3-Next/Qwen3.5 GDN mixer layers under attention-DP, implemented via a new weight-mapper combination step, updated exclude-module naming, and a branching forward path in the GatedDeltaNet module. Additionally, several Triton kernels are updated to use explicit tensor strides instead of contiguous-layout assumptions, with corresponding new unit tests.

Changes

Combined GDN Projection and Strided Kernel Support

Layer / File(s)	Summary
Weight mapper: combine GDN QKVZ/BA into single tensor `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py`, `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`	Adds `_combine_gdn_input_projections` to validate, reshape, split, and concatenate checkpoint QKVZ/BA tensors into `in_proj_qkvzba`, wired into `preprocess_weights` under attention-DP; updates related docstring.
Quantization exclude-modules naming `tensorrt_llm/_torch/models/modeling_qwen3_5.py`	Rewrites exclude-module regex mapping to `in_proj_qkvzba*` when attention-DP is enabled, otherwise retains split naming.
GatedDeltaNet: combined projection module and forward path `tensorrt_llm/_torch/modules/mamba/gdn_mixer.py`	Adds `use_combined_qkvzba_projection` flag, constructs single combined linear layer, branches `forward()` to slice combined projection output directly, and adjusts RMSNorm reshape logic.
Triton kernels: explicit stride-based indexing `tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py`, `tensorrt_llm/_torch/modules/mamba/gdn_mixer.py`, `tensorrt_llm/_torch/modules/mamba/layernorm_gated.py`	Updates kernels and wrappers to compute offsets from explicit tensor strides rather than assumed contiguity; adds `REPEAT_WEIGHT` flag for repeated weight/bias layouts.
Tests: weight mapper combination and strided kernel behavior `tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py`, `tests/unittest/_torch/modules/mamba/test_gdn_kernel_optimizations.py`	Adds tests for combined-projection ordering/validation and strided-tensor kernel correctness.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Checkpoint as HF Checkpoint
  participant Mapper as Qwen3NextHfWeightMapper
  participant Config as ModelConfig
  participant GDN as Qwen3NextGatedDeltaNet

  Checkpoint->>Mapper: preprocess_weights(weights)
  Mapper->>Config: check enable_attention_dp
  alt attention-DP enabled
    Mapper->>Mapper: _combine_gdn_input_projections
    Mapper-->>Checkpoint: weights with in_proj_qkvzba
  else attention-DP disabled
    Mapper-->>Checkpoint: weights with separate in_proj_qkvz/in_proj_ba
  end
  Checkpoint->>GDN: load state_dict
  GDN->>GDN: init use_combined_qkvzba_projection
  GDN->>GDN: forward(hidden_states)
  alt combined projection
    GDN->>GDN: slice projected_states_qkvzba into mixed_qkv, z, b, a
  else split projection
    GDN->>GDN: compute qkvz + ba, fuse/split into mixed_qkv, z, b, a
  end
  GDN-->>Checkpoint: attention output

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description leaves the required Description and Test Coverage sections blank, so it does not meet the template.	Add a short problem/solution summary and list the tests covering the new combined-projection and strided-kernel paths.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title is concise, specific, and matches the main change: fusing Qwen3.5 GDN input projections for attention DP.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py (1)
46-55: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Cast dst_offsets to tl.int64 too. The store index has the same overflow risk as src_offsets; with large num_prefill_tokens * conv_dim, conv_offsets[:, None] * num_prefill_tokens can wrap and write to the wrong address.
Proposed fix
-    dst_offsets = conv_offsets[:, None] * num_prefill_tokens + seq_offsets[None, :]
+    dst_offsets = conv_offsets[:, None].to(tl.int64) * num_prefill_tokens + seq_offsets[None, :]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py` around lines 46 -
55, The dst_offsets calculation in fuse_elementwise_ops has the same overflow
risk as src_offsets because conv_offsets[:, None] * num_prefill_tokens can
exceed INT32_MAX and corrupt the store address. Update the dst_offsets
expression in the same kernel path to cast the operands to tl.int64 before the
multiply/add, mirroring the existing src_offsets fix, and keep the tl.store call
using the widened offsets.

🧹 Nitpick comments (1)

tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py (1)
35-73: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Test coverage is good for the happy path but misses key error branches.

_combine_gdn_input_projections (per the upstream snippet) also raises ValueError when only one of qkvz/ba is present ("Expected both QKVZ and BA tensors...") and when trailing shapes mismatch between qkvz/ba row tensors. Neither path is covered here.

Suggest adding:

test_combine_gdn_input_projections_missing_projection_raises — only in_proj_qkvz.* present, expect ValueError matching "Expected both QKVZ and BA".

test_combine_gdn_input_projections_rejects_trailing_shape_mismatch — row-tensors with mismatched shape[1:], expect ValueError matching "trailing shapes do not match".

Coverage for the two implemented cases (consumer-order combination, scalar-metadata rejection) is sufficient and correctly verified against the actual reshape/slice math.

As per path instructions, "Act as a QA engineer reviewing test changes and coverage for TensorRT-LLM" and "suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py`
around lines 35 - 73, Add the missing error-path coverage for
_combine_gdn_input_projections in test_qwen3_next_weight_mapper.py: create a
test that passes only in_proj_qkvz.* entries and asserts a ValueError matching
the “Expected both QKVZ and BA” message, and another that uses row-tensor inputs
with mismatched trailing shapes and asserts the “trailing shapes do not match”
error. Keep the existing consumer-order and scalar-metadata tests as-is, since
they already verify the happy path and non-row metadata rejection.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py`:
- Around line 47-51: The combine logic in qwen3_next_weight_mapper should not
require paired qkvz/ba tensors until FP8 BA scale metadata has been handled.
Update the grouping/combine flow in the superclass method used by
Qwen3_5MoeHfWeightMapper so orphan in_proj_ba.weight_scale_inv entries are
dequantized or dropped alongside the BA projection, and only then enforce the
“Expected both QKVZ and BA” check for the relevant symbols in the combine path
and _dequantize_linear_attn_fp8_qkvz.
- Around line 55-90: The combined projection packing in qwen3_next_weight_mapper
should fail fast if an existing in_proj_qkvzba.<suffix> key is already present,
instead of silently overwriting it. Add the same duplicate-key check used by the
split packing path before assigning into combined_weights in the combine logic,
and raise an error when both the combined key and the split in_proj_qkvz /
in_proj_ba inputs would map to the same output.

---

Outside diff comments:
In `@tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py`:
- Around line 46-55: The dst_offsets calculation in fuse_elementwise_ops has the
same overflow risk as src_offsets because conv_offsets[:, None] *
num_prefill_tokens can exceed INT32_MAX and corrupt the store address. Update
the dst_offsets expression in the same kernel path to cast the operands to
tl.int64 before the multiply/add, mirroring the existing src_offsets fix, and
keep the tl.store call using the widened offsets.

---

Nitpick comments:
In
`@tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py`:
- Around line 35-73: Add the missing error-path coverage for
_combine_gdn_input_projections in test_qwen3_next_weight_mapper.py: create a
test that passes only in_proj_qkvz.* entries and asserts a ValueError matching
the “Expected both QKVZ and BA” message, and another that uses row-tensor inputs
with mismatched trailing shapes and asserts the “trailing shapes do not match”
error. Keep the existing consumer-order and scalar-metadata tests as-is, since
they already verify the happy path and non-row metadata rejection.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 589c1b79-8970-4b4a-8d07-a2c4afc35ea8

📥 Commits

Reviewing files that changed from the base of the PR and between 0d433e0 and 3510d42.

📒 Files selected for processing (8)

tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py
tensorrt_llm/_torch/models/modeling_qwen3_5.py
tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py
tensorrt_llm/_torch/modules/mamba/gdn_mixer.py
tensorrt_llm/_torch/modules/mamba/layernorm_gated.py
tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py
tests/unittest/_torch/modules/mamba/test_gdn_kernel_optimizations.py

coderabbitai · 2026-07-02T10:19:11Z

+        for (prefix, suffix), tensors in grouped.items():
+            if tensors.keys() != {"qkvz", "ba"}:
+                raise ValueError(
+                    f"Expected both QKVZ and BA tensors for {prefix}.{suffix}, "
+                    f"got {sorted(tensors)}")


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Handle Qwen3.5 FP8 BA scales before requiring paired suffixes.

With attention-DP enabled, Qwen3_5MoeHfWeightMapper calls this superclass combine after _dequantize_linear_attn_fp8_qkvz() removes only in_proj_qkvz.weight_scale_inv; in_proj_ba.weight_scale_inv can remain alone and hit this Expected both QKVZ and BA error. Dequantize/drop BA scale metadata for the combined non-quantized projection, or skip orphan metadata only after ensuring the corresponding BA weight has been dequantized too.

Also applies to: 195-196

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py` around lines 47 - 51, The combine logic in qwen3_next_weight_mapper should not require paired qkvz/ba tensors until FP8 BA scale metadata has been handled. Update the grouping/combine flow in the superclass method used by Qwen3_5MoeHfWeightMapper so orphan in_proj_ba.weight_scale_inv entries are dequantized or dropped alongside the BA projection, and only then enforce the “Expected both QKVZ and BA” check for the relevant symbols in the combine path and _dequantize_linear_attn_fp8_qkvz.

coderabbitai · 2026-07-02T10:19:11Z

+            combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"
+
+            # Scalar/per-tensor metadata is shared by the two projections. It
+            # cannot be row-reordered, so retain one copy after validating it.
+            if (qkvz.ndim == 0 or ba.ndim == 0 or qkvz.shape[0] != expected_qkvz
+                    or ba.shape[0] != expected_ba):
+                if qkvz.shape != ba.shape or not torch.equal(qkvz, ba):
+                    raise ValueError(
+                        f"Cannot combine non-row GDN projection metadata "
+                        f"{prefix}.{suffix}: QKVZ shape={tuple(qkvz.shape)}, "
+                        f"BA shape={tuple(ba.shape)}")
+                combined_weights[combined_name] = qkvz
+                continue
+
+            if qkvz.shape[1:] != ba.shape[1:]:
+                raise ValueError(
+                    f"GDN projection trailing shapes do not match for "
+                    f"{prefix}.{suffix}: {tuple(qkvz.shape)} vs {tuple(ba.shape)}"
+                )
+
+            trailing_shape = qkvz.shape[1:]
+            qkvz = qkvz.reshape(num_k_heads, qkvz_group_dim, *trailing_shape)
+            ba = ba.reshape(num_k_heads, ba_group_dim, *trailing_shape)
+
+            q_end = head_k_dim
+            k_end = q_end + head_k_dim
+            v_end = k_end + heads_ratio * head_v_dim
+            z_end = v_end + heads_ratio * head_v_dim
+            q = qkvz[:, :q_end].reshape(-1, *trailing_shape)
+            k = qkvz[:, q_end:k_end].reshape(-1, *trailing_shape)
+            v = qkvz[:, k_end:v_end].reshape(-1, *trailing_shape)
+            z = qkvz[:, v_end:z_end].reshape(-1, *trailing_shape)
+            b = ba[:, :heads_ratio].reshape(-1, *trailing_shape)
+            a = ba[:, heads_ratio:].reshape(-1, *trailing_shape)
+            combined_weights[combined_name] = torch.cat((q, k, v, z, b, a),
+                                                        dim=0).contiguous()


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Reject existing combined projection keys before overwriting them.

If a checkpoint already contains in_proj_qkvzba.<suffix> plus split in_proj_qkvz/in_proj_ba, line 89 silently overwrites the original combined tensor. Mirror the duplicate-key guard used by the split packer and fail fast.

Proposed guard

qkvz = tensors["qkvz"] ba = tensors["ba"] combined_name = f"{prefix}.in_proj_qkvzba.{suffix}" + if combined_name in combined_weights: + raise ValueError(f"Combined projection {combined_name} already exists") # Scalar/per-tensor metadata is shared by the two projections. It

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"

# Scalar/per-tensor metadata is shared by the two projections. It

# cannot be row-reordered, so retain one copy after validating it.

if (qkvz.ndim == 0 or ba.ndim == 0 or qkvz.shape[0] != expected_qkvz

or ba.shape[0] != expected_ba):

if qkvz.shape != ba.shape or not torch.equal(qkvz, ba):

raise ValueError(

f"Cannot combine non-row GDN projection metadata "

f"{prefix}.{suffix}: QKVZ shape={tuple(qkvz.shape)}, "

f"BA shape={tuple(ba.shape)}")

combined_weights[combined_name] = qkvz

continue

if qkvz.shape[1:] != ba.shape[1:]:

raise ValueError(

f"GDN projection trailing shapes do not match for "

f"{prefix}.{suffix}: {tuple(qkvz.shape)} vs {tuple(ba.shape)}"

)

trailing_shape = qkvz.shape[1:]

qkvz = qkvz.reshape(num_k_heads, qkvz_group_dim, *trailing_shape)

ba = ba.reshape(num_k_heads, ba_group_dim, *trailing_shape)

q_end = head_k_dim

k_end = q_end + head_k_dim

v_end = k_end + heads_ratio * head_v_dim

z_end = v_end + heads_ratio * head_v_dim

q = qkvz[:, :q_end].reshape(-1, *trailing_shape)

k = qkvz[:, q_end:k_end].reshape(-1, *trailing_shape)

v = qkvz[:, k_end:v_end].reshape(-1, *trailing_shape)

z = qkvz[:, v_end:z_end].reshape(-1, *trailing_shape)

b = ba[:, :heads_ratio].reshape(-1, *trailing_shape)

a = ba[:, heads_ratio:].reshape(-1, *trailing_shape)

combined_weights[combined_name] = torch.cat((q, k, v, z, b, a),

dim=0).contiguous()

qkvz = tensors["qkvz"]

ba = tensors["ba"]

combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"

if combined_name in combined_weights:

raise ValueError(f"Combined projection {combined_name} already exists")

# Scalar/per-tensor metadata is shared by the two projections. It

# cannot be row-reordered, so retain one copy after validating it.

if (qkvz.ndim == 0 or ba.ndim == 0 or qkvz.shape[0] != expected_qkvz

or ba.shape[0] != expected_ba):

if qkvz.shape != ba.shape or not torch.equal(qkvz, ba):

raise ValueError(

f"Cannot combine non-row GDN projection metadata "

f"{prefix}.{suffix}: QKVZ shape={tuple(qkvz.shape)}, "

f"BA shape={tuple(ba.shape)}")

combined_weights[combined_name] = qkvz

continue

if qkvz.shape[1:] != ba.shape[1:]:

raise ValueError(

f"GDN projection trailing shapes do not match for "

f"{prefix}.{suffix}: {tuple(qkvz.shape)} vs {tuple(ba.shape)}"

)

trailing_shape = qkvz.shape[1:]

qkvz = qkvz.reshape(num_k_heads, qkvz_group_dim, *trailing_shape)

ba = ba.reshape(num_k_heads, ba_group_dim, *trailing_shape)

q_end = head_k_dim

k_end = q_end + head_k_dim

v_end = k_end + heads_ratio * head_v_dim

z_end = v_end + heads_ratio * head_v_dim

q = qkvz[:, :q_end].reshape(-1, *trailing_shape)

k = qkvz[:, q_end:k_end].reshape(-1, *trailing_shape)

v = qkvz[:, k_end:v_end].reshape(-1, *trailing_shape)

z = qkvz[:, v_end:z_end].reshape(-1, *trailing_shape)

b = ba[:, :heads_ratio].reshape(-1, *trailing_shape)

a = ba[:, heads_ratio:].reshape(-1, *trailing_shape)

combined_weights[combined_name] = torch.cat((q, k, v, z, b, a),

dim=0).contiguous()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py` around lines 55 - 90, The combined projection packing in qwen3_next_weight_mapper should fail fast if an existing in_proj_qkvzba.<suffix> key is already present, instead of silently overwriting it. Add the same duplicate-key check used by the split packing path before assigning into combined_weights in the combine logic, and raise an error when both the combined key and the split in_proj_qkvz / in_proj_ba inputs would map to the same output.

mingyangHao · 2026-07-03T01:05:20Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-03T01:11:17Z

PR_Github #57284 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

tensorrt-cicd · 2026-07-03T03:14:58Z

PR_Github #57284 [ run ] completed with state FAILURE. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46045 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

mingyangHao · 2026-07-03T03:16:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-03T03:21:38Z

PR_Github #57328 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

tensorrt-cicd · 2026-07-03T06:10:11Z

PR_Github #57328 [ run ] completed with state FAILURE. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46086 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

mingyangHao · 2026-07-03T06:42:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-03T06:49:02Z

PR_Github #57389 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

tensorrt-cicd · 2026-07-03T10:24:52Z

PR_Github #57389 [ run ] completed with state SUCCESS. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46137 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

mingyangHao · 2026-07-03T10:48:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-03T10:54:23Z

PR_Github #57457 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

tensorrt-cicd · 2026-07-03T12:34:38Z

PR_Github #57457 [ run ] completed with state SUCCESS. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46195 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>

mingyangHao · 2026-07-04T04:59:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-04T05:05:21Z

PR_Github #57538 [ run ] triggered by Bot. Commit: a14cb11 Link to invocation

tensorrt-cicd · 2026-07-04T08:52:03Z

PR_Github #57538 [ run ] completed with state SUCCESS. Commit: a14cb11
/LLM/main/L0_MergeRequest_PR pipeline #46268 completed with status: 'SUCCESS'

CI Report

Link to invocation

[None][perf] Fuse Qwen3.5 GDN input projections for attention DP

3510d42

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>

mingyangHao requested review from a team as code owners July 2, 2026 10:10

mingyangHao requested review from Wanli-Jiang, aswinvisva and moraxu July 2, 2026 10:10

github-actions Bot assigned mingyangHao Jul 2, 2026

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

mingyangHao removed request for Wanli-Jiang, aswinvisva and moraxu July 3, 2026 01:06

[None][perf] Enable fused Qwen3.5 GDN projection for TP

a14cb11

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>

mingyangHao requested a review from a team as a code owner July 4, 2026 03:28

mingyangHao requested a review from achartier July 4, 2026 03:28

mingyangHao changed the title ~~[None][perf] Fuse Qwen3.5 GDN input projections for attention DP~~ [None][perf] Fuse Qwen3.5 GDN input projections Jul 4, 2026

Uh oh!

Conversation

mingyangHao commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

mingyangHao commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

mingyangHao commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

mingyangHao commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

mingyangHao commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

mingyangHao commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

mingyangHao commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

tensorrt-cicd commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mingyangHao commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading