Skip to content

[None][perf] Fuse Qwen3.5 GDN input projections#15884

Open
mingyangHao wants to merge 2 commits into
NVIDIA:mainfrom
mingyangHao:user/mingyangh/qwen35-gdn-fused-projection
Open

[None][perf] Fuse Qwen3.5 GDN input projections#15884
mingyangHao wants to merge 2 commits into
NVIDIA:mainfrom
mingyangHao:user/mingyangh/qwen35-gdn-fused-projection

Conversation

@mingyangHao

@mingyangHao mingyangHao commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added support for combined QKVZBA attention projections in Qwen3.5/Qwen3 Next paths when attention-DP is enabled.
    • Improved handling of non-contiguous tensor layouts across several optimized Mamba operations.
  • Bug Fixes

    • Fixed projection weight ordering so attention components are combined in the expected consumer order.
    • Made layer normalization accept repeated weight and bias layouts for grouped inputs.
    • Updated optimized kernels to work correctly with strided input views and padded storage.
Tokens Dual-stream baseline Combined projection Latency reduction Speedup
1 49.152 us 45.024 us 8.40% 1.09x
8 49.152 us 45.056 us 8.33% 1.09x
32 49.152 us 40.960 us 16.67% 1.20x
64 51.200 us 43.008 us 16.00% 1.19x
256 63.520 us 49.184 us 22.57% 1.29x
1024 174.112 us 155.616 us 10.62% 1.12x
8192 1173.504 us 1058.784 us 9.78% 1.11x

Perf for TP4:

Tokens/rank Dual-stream baseline Combined projection Latency reduction Speedup
1 22.560 us 16.416 us 27.2% 1.37x
8 24.608 us 18.464 us 25.0% 1.33x
32 24.576 us 18.176 us 26.0% 1.35x
64 24.576 us 18.464 us 24.9% 1.33x
256 28.768 us 20.480 us 28.8% 1.40x
1024 53.248 us 43.008 us 19.2% 1.24x
8192 288.864 us 256.000 us 11.4% 1.13x

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>
@mingyangHao

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a combined in_proj_qkvzba projection path for Qwen3-Next/Qwen3.5 GDN mixer layers under attention-DP, implemented via a new weight-mapper combination step, updated exclude-module naming, and a branching forward path in the GatedDeltaNet module. Additionally, several Triton kernels are updated to use explicit tensor strides instead of contiguous-layout assumptions, with corresponding new unit tests.

Changes

Combined GDN Projection and Strided Kernel Support

Layer / File(s) Summary
Weight mapper: combine GDN QKVZ/BA into single tensor
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py, tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
Adds _combine_gdn_input_projections to validate, reshape, split, and concatenate checkpoint QKVZ/BA tensors into in_proj_qkvzba, wired into preprocess_weights under attention-DP; updates related docstring.
Quantization exclude-modules naming
tensorrt_llm/_torch/models/modeling_qwen3_5.py
Rewrites exclude-module regex mapping to in_proj_qkvzba* when attention-DP is enabled, otherwise retains split naming.
GatedDeltaNet: combined projection module and forward path
tensorrt_llm/_torch/modules/mamba/gdn_mixer.py
Adds use_combined_qkvzba_projection flag, constructs single combined linear layer, branches forward() to slice combined projection output directly, and adjusts RMSNorm reshape logic.
Triton kernels: explicit stride-based indexing
tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py, tensorrt_llm/_torch/modules/mamba/gdn_mixer.py, tensorrt_llm/_torch/modules/mamba/layernorm_gated.py
Updates kernels and wrappers to compute offsets from explicit tensor strides rather than assumed contiguity; adds REPEAT_WEIGHT flag for repeated weight/bias layouts.
Tests: weight mapper combination and strided kernel behavior
tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py, tests/unittest/_torch/modules/mamba/test_gdn_kernel_optimizations.py
Adds tests for combined-projection ordering/validation and strided-tensor kernel correctness.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Checkpoint as HF Checkpoint
  participant Mapper as Qwen3NextHfWeightMapper
  participant Config as ModelConfig
  participant GDN as Qwen3NextGatedDeltaNet

  Checkpoint->>Mapper: preprocess_weights(weights)
  Mapper->>Config: check enable_attention_dp
  alt attention-DP enabled
    Mapper->>Mapper: _combine_gdn_input_projections
    Mapper-->>Checkpoint: weights with in_proj_qkvzba
  else attention-DP disabled
    Mapper-->>Checkpoint: weights with separate in_proj_qkvz/in_proj_ba
  end
  Checkpoint->>GDN: load state_dict
  GDN->>GDN: init use_combined_qkvzba_projection
  GDN->>GDN: forward(hidden_states)
  alt combined projection
    GDN->>GDN: slice projected_states_qkvzba into mixed_qkv, z, b, a
  else split projection
    GDN->>GDN: compute qkvz + ba, fuse/split into mixed_qkv, z, b, a
  end
  GDN-->>Checkpoint: attention output
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description leaves the required Description and Test Coverage sections blank, so it does not meet the template. Add a short problem/solution summary and list the tests covering the new combined-projection and strided-kernel paths.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title is concise, specific, and matches the main change: fusing Qwen3.5 GDN input projections for attention DP.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py (1)

46-55: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Cast dst_offsets to tl.int64 too. The store index has the same overflow risk as src_offsets; with large num_prefill_tokens * conv_dim, conv_offsets[:, None] * num_prefill_tokens can wrap and write to the wrong address.

Proposed fix
-    dst_offsets = conv_offsets[:, None] * num_prefill_tokens + seq_offsets[None, :]
+    dst_offsets = conv_offsets[:, None].to(tl.int64) * num_prefill_tokens + seq_offsets[None, :]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py` around lines 46 -
55, The dst_offsets calculation in fuse_elementwise_ops has the same overflow
risk as src_offsets because conv_offsets[:, None] * num_prefill_tokens can
exceed INT32_MAX and corrupt the store address. Update the dst_offsets
expression in the same kernel path to cast the operands to tl.int64 before the
multiply/add, mirroring the existing src_offsets fix, and keep the tl.store call
using the widened offsets.
🧹 Nitpick comments (1)
tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py (1)

35-73: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Test coverage is good for the happy path but misses key error branches.

_combine_gdn_input_projections (per the upstream snippet) also raises ValueError when only one of qkvz/ba is present ("Expected both QKVZ and BA tensors...") and when trailing shapes mismatch between qkvz/ba row tensors. Neither path is covered here.

Suggest adding:

  • test_combine_gdn_input_projections_missing_projection_raises — only in_proj_qkvz.* present, expect ValueError matching "Expected both QKVZ and BA".
  • test_combine_gdn_input_projections_rejects_trailing_shape_mismatch — row-tensors with mismatched shape[1:], expect ValueError matching "trailing shapes do not match".

Coverage for the two implemented cases (consumer-order combination, scalar-metadata rejection) is sufficient and correctly verified against the actual reshape/slice math.

As per path instructions, "Act as a QA engineer reviewing test changes and coverage for TensorRT-LLM" and "suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py`
around lines 35 - 73, Add the missing error-path coverage for
_combine_gdn_input_projections in test_qwen3_next_weight_mapper.py: create a
test that passes only in_proj_qkvz.* entries and asserts a ValueError matching
the “Expected both QKVZ and BA” message, and another that uses row-tensor inputs
with mismatched trailing shapes and asserts the “trailing shapes do not match”
error. Keep the existing consumer-order and scalar-metadata tests as-is, since
they already verify the happy path and non-row metadata rejection.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py`:
- Around line 47-51: The combine logic in qwen3_next_weight_mapper should not
require paired qkvz/ba tensors until FP8 BA scale metadata has been handled.
Update the grouping/combine flow in the superclass method used by
Qwen3_5MoeHfWeightMapper so orphan in_proj_ba.weight_scale_inv entries are
dequantized or dropped alongside the BA projection, and only then enforce the
“Expected both QKVZ and BA” check for the relevant symbols in the combine path
and _dequantize_linear_attn_fp8_qkvz.
- Around line 55-90: The combined projection packing in qwen3_next_weight_mapper
should fail fast if an existing in_proj_qkvzba.<suffix> key is already present,
instead of silently overwriting it. Add the same duplicate-key check used by the
split packing path before assigning into combined_weights in the combine logic,
and raise an error when both the combined key and the split in_proj_qkvz /
in_proj_ba inputs would map to the same output.

---

Outside diff comments:
In `@tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py`:
- Around line 46-55: The dst_offsets calculation in fuse_elementwise_ops has the
same overflow risk as src_offsets because conv_offsets[:, None] *
num_prefill_tokens can exceed INT32_MAX and corrupt the store address. Update
the dst_offsets expression in the same kernel path to cast the operands to
tl.int64 before the multiply/add, mirroring the existing src_offsets fix, and
keep the tl.store call using the widened offsets.

---

Nitpick comments:
In
`@tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py`:
- Around line 35-73: Add the missing error-path coverage for
_combine_gdn_input_projections in test_qwen3_next_weight_mapper.py: create a
test that passes only in_proj_qkvz.* entries and asserts a ValueError matching
the “Expected both QKVZ and BA” message, and another that uses row-tensor inputs
with mismatched trailing shapes and asserts the “trailing shapes do not match”
error. Keep the existing consumer-order and scalar-metadata tests as-is, since
they already verify the happy path and non-row metadata rejection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 589c1b79-8970-4b4a-8d07-a2c4afc35ea8

📥 Commits

Reviewing files that changed from the base of the PR and between 0d433e0 and 3510d42.

📒 Files selected for processing (8)
  • tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
  • tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py
  • tensorrt_llm/_torch/models/modeling_qwen3_5.py
  • tensorrt_llm/_torch/modules/mamba/fuse_elementwise_ops.py
  • tensorrt_llm/_torch/modules/mamba/gdn_mixer.py
  • tensorrt_llm/_torch/modules/mamba/layernorm_gated.py
  • tests/unittest/_torch/models/checkpoints/hf/test_qwen3_next_weight_mapper.py
  • tests/unittest/_torch/modules/mamba/test_gdn_kernel_optimizations.py

Comment on lines +47 to +51
for (prefix, suffix), tensors in grouped.items():
if tensors.keys() != {"qkvz", "ba"}:
raise ValueError(
f"Expected both QKVZ and BA tensors for {prefix}.{suffix}, "
f"got {sorted(tensors)}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Handle Qwen3.5 FP8 BA scales before requiring paired suffixes.

With attention-DP enabled, Qwen3_5MoeHfWeightMapper calls this superclass combine after _dequantize_linear_attn_fp8_qkvz() removes only in_proj_qkvz.weight_scale_inv; in_proj_ba.weight_scale_inv can remain alone and hit this Expected both QKVZ and BA error. Dequantize/drop BA scale metadata for the combined non-quantized projection, or skip orphan metadata only after ensuring the corresponding BA weight has been dequantized too.

Also applies to: 195-196

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py` around
lines 47 - 51, The combine logic in qwen3_next_weight_mapper should not require
paired qkvz/ba tensors until FP8 BA scale metadata has been handled. Update the
grouping/combine flow in the superclass method used by Qwen3_5MoeHfWeightMapper
so orphan in_proj_ba.weight_scale_inv entries are dequantized or dropped
alongside the BA projection, and only then enforce the “Expected both QKVZ and
BA” check for the relevant symbols in the combine path and
_dequantize_linear_attn_fp8_qkvz.

Comment on lines +55 to +90
combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"

# Scalar/per-tensor metadata is shared by the two projections. It
# cannot be row-reordered, so retain one copy after validating it.
if (qkvz.ndim == 0 or ba.ndim == 0 or qkvz.shape[0] != expected_qkvz
or ba.shape[0] != expected_ba):
if qkvz.shape != ba.shape or not torch.equal(qkvz, ba):
raise ValueError(
f"Cannot combine non-row GDN projection metadata "
f"{prefix}.{suffix}: QKVZ shape={tuple(qkvz.shape)}, "
f"BA shape={tuple(ba.shape)}")
combined_weights[combined_name] = qkvz
continue

if qkvz.shape[1:] != ba.shape[1:]:
raise ValueError(
f"GDN projection trailing shapes do not match for "
f"{prefix}.{suffix}: {tuple(qkvz.shape)} vs {tuple(ba.shape)}"
)

trailing_shape = qkvz.shape[1:]
qkvz = qkvz.reshape(num_k_heads, qkvz_group_dim, *trailing_shape)
ba = ba.reshape(num_k_heads, ba_group_dim, *trailing_shape)

q_end = head_k_dim
k_end = q_end + head_k_dim
v_end = k_end + heads_ratio * head_v_dim
z_end = v_end + heads_ratio * head_v_dim
q = qkvz[:, :q_end].reshape(-1, *trailing_shape)
k = qkvz[:, q_end:k_end].reshape(-1, *trailing_shape)
v = qkvz[:, k_end:v_end].reshape(-1, *trailing_shape)
z = qkvz[:, v_end:z_end].reshape(-1, *trailing_shape)
b = ba[:, :heads_ratio].reshape(-1, *trailing_shape)
a = ba[:, heads_ratio:].reshape(-1, *trailing_shape)
combined_weights[combined_name] = torch.cat((q, k, v, z, b, a),
dim=0).contiguous()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Reject existing combined projection keys before overwriting them.

If a checkpoint already contains in_proj_qkvzba.<suffix> plus split in_proj_qkvz/in_proj_ba, line 89 silently overwrites the original combined tensor. Mirror the duplicate-key guard used by the split packer and fail fast.

Proposed guard
             qkvz = tensors["qkvz"]
             ba = tensors["ba"]
             combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"
+            if combined_name in combined_weights:
+                raise ValueError(f"Combined projection {combined_name} already exists")
 
             # Scalar/per-tensor metadata is shared by the two projections. It
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"
# Scalar/per-tensor metadata is shared by the two projections. It
# cannot be row-reordered, so retain one copy after validating it.
if (qkvz.ndim == 0 or ba.ndim == 0 or qkvz.shape[0] != expected_qkvz
or ba.shape[0] != expected_ba):
if qkvz.shape != ba.shape or not torch.equal(qkvz, ba):
raise ValueError(
f"Cannot combine non-row GDN projection metadata "
f"{prefix}.{suffix}: QKVZ shape={tuple(qkvz.shape)}, "
f"BA shape={tuple(ba.shape)}")
combined_weights[combined_name] = qkvz
continue
if qkvz.shape[1:] != ba.shape[1:]:
raise ValueError(
f"GDN projection trailing shapes do not match for "
f"{prefix}.{suffix}: {tuple(qkvz.shape)} vs {tuple(ba.shape)}"
)
trailing_shape = qkvz.shape[1:]
qkvz = qkvz.reshape(num_k_heads, qkvz_group_dim, *trailing_shape)
ba = ba.reshape(num_k_heads, ba_group_dim, *trailing_shape)
q_end = head_k_dim
k_end = q_end + head_k_dim
v_end = k_end + heads_ratio * head_v_dim
z_end = v_end + heads_ratio * head_v_dim
q = qkvz[:, :q_end].reshape(-1, *trailing_shape)
k = qkvz[:, q_end:k_end].reshape(-1, *trailing_shape)
v = qkvz[:, k_end:v_end].reshape(-1, *trailing_shape)
z = qkvz[:, v_end:z_end].reshape(-1, *trailing_shape)
b = ba[:, :heads_ratio].reshape(-1, *trailing_shape)
a = ba[:, heads_ratio:].reshape(-1, *trailing_shape)
combined_weights[combined_name] = torch.cat((q, k, v, z, b, a),
dim=0).contiguous()
qkvz = tensors["qkvz"]
ba = tensors["ba"]
combined_name = f"{prefix}.in_proj_qkvzba.{suffix}"
if combined_name in combined_weights:
raise ValueError(f"Combined projection {combined_name} already exists")
# Scalar/per-tensor metadata is shared by the two projections. It
# cannot be row-reordered, so retain one copy after validating it.
if (qkvz.ndim == 0 or ba.ndim == 0 or qkvz.shape[0] != expected_qkvz
or ba.shape[0] != expected_ba):
if qkvz.shape != ba.shape or not torch.equal(qkvz, ba):
raise ValueError(
f"Cannot combine non-row GDN projection metadata "
f"{prefix}.{suffix}: QKVZ shape={tuple(qkvz.shape)}, "
f"BA shape={tuple(ba.shape)}")
combined_weights[combined_name] = qkvz
continue
if qkvz.shape[1:] != ba.shape[1:]:
raise ValueError(
f"GDN projection trailing shapes do not match for "
f"{prefix}.{suffix}: {tuple(qkvz.shape)} vs {tuple(ba.shape)}"
)
trailing_shape = qkvz.shape[1:]
qkvz = qkvz.reshape(num_k_heads, qkvz_group_dim, *trailing_shape)
ba = ba.reshape(num_k_heads, ba_group_dim, *trailing_shape)
q_end = head_k_dim
k_end = q_end + head_k_dim
v_end = k_end + heads_ratio * head_v_dim
z_end = v_end + heads_ratio * head_v_dim
q = qkvz[:, :q_end].reshape(-1, *trailing_shape)
k = qkvz[:, q_end:k_end].reshape(-1, *trailing_shape)
v = qkvz[:, k_end:v_end].reshape(-1, *trailing_shape)
z = qkvz[:, v_end:z_end].reshape(-1, *trailing_shape)
b = ba[:, :heads_ratio].reshape(-1, *trailing_shape)
a = ba[:, heads_ratio:].reshape(-1, *trailing_shape)
combined_weights[combined_name] = torch.cat((q, k, v, z, b, a),
dim=0).contiguous()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_next_weight_mapper.py` around
lines 55 - 90, The combined projection packing in qwen3_next_weight_mapper
should fail fast if an existing in_proj_qkvzba.<suffix> key is already present,
instead of silently overwriting it. Add the same duplicate-key check used by the
split packing path before assigning into combined_weights in the combine logic,
and raise an error when both the combined key and the split in_proj_qkvz /
in_proj_ba inputs would map to the same output.

@mingyangHao

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57284 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57284 [ run ] completed with state FAILURE. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46045 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mingyangHao

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57328 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57328 [ run ] completed with state FAILURE. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46086 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mingyangHao

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57389 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57389 [ run ] completed with state SUCCESS. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46137 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mingyangHao

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57457 [ run ] triggered by Bot. Commit: 3510d42 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57457 [ run ] completed with state SUCCESS. Commit: 3510d42
/LLM/main/L0_MergeRequest_PR pipeline #46195 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>
@mingyangHao mingyangHao requested a review from a team as a code owner July 4, 2026 03:28
@mingyangHao mingyangHao requested a review from achartier July 4, 2026 03:28
@mingyangHao mingyangHao changed the title [None][perf] Fuse Qwen3.5 GDN input projections for attention DP [None][perf] Fuse Qwen3.5 GDN input projections Jul 4, 2026
@mingyangHao

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57538 [ run ] triggered by Bot. Commit: a14cb11 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57538 [ run ] completed with state SUCCESS. Commit: a14cb11
/LLM/main/L0_MergeRequest_PR pipeline #46268 completed with status: 'SUCCESS'

CI Report

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants