[None][perf] Move greedy stop checks to host by mingyangHao · Pull Request #15920 · NVIDIA/TensorRT-LLM

mingyangHao · 2026-07-03T10:35:53Z

Summary by CodeRabbit

New Features
- Added a host-side stop-criteria path for fast greedy sampling in eligible single-token, single-beam requests.
- Improved request handling so sampled tokens can be finalized directly on the host in supported cases.
Bug Fixes
- Reduced unnecessary device-side finish-reason processing when host stop criteria can be used.
- Adjusted token placement logic to work correctly with the new fast path.
Tests
- Added coverage for host stop-criteria behavior and fallback cases where it should remain disabled.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>

mingyangHao · 2026-07-03T10:36:20Z

/bot run --disable-fail-fast

coderabbitai · 2026-07-03T10:40:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cb10b8d8-51c2-4a46-875a-3b2cf31f4f43

📥 Commits

Reviewing files that changed from the base of the PR and between 3b23a9f and cdbd8a3.

📒 Files selected for processing (2)

tensorrt_llm/_torch/pyexecutor/sampler.py
tests/unittest/_torch/sampler/test_torch_sampler.py

📝 Walkthrough

Walkthrough

This PR adds an optional host-side stop-criteria evaluation path for single-token, single-beam fast-greedy sampling in TorchSampler. A new use_host_stop_criteria flag is computed, propagated through SampleStateTorch, _process_requests, sample_async, and update_requests, and covered by new unit tests.

Changes

Host stop criteria for fast-greedy sampling

Layer / File(s)	Summary
State contract and eligibility check `tensorrt_llm/_torch/pyexecutor/sampler.py`	Adds `use_host_stop_criteria` field to `SampleStateTorch` and a new `_can_use_host_stop_criteria` helper that checks fast-greedy eligibility, single-token/single-beam constraints, and absence of stop words/draft requests.
_process_requests computation and propagation `tensorrt_llm/_torch/pyexecutor/sampler.py`	Computes `use_fast_greedy_path`/`use_host_stop_criteria`, skips building `seq_lens_host` when enabled, derives scatter destination indices from `seq_slots_cuda`, adds assertions for the non-fast-greedy path, and extends the return tuple at all return points.
sample_async and update_requests wiring `tensorrt_llm/_torch/pyexecutor/sampler.py`	`sample_async` captures and forwards the new flag, skips device finish-reasons setup when enabled; `update_requests` branches to add sampled tokens and call `_handle_stop_criteria` on host instead of device-based draft-token handling.
Unit tests `tests/unittest/_torch/sampler/test_torch_sampler.py`	Adds parametrized tests for host stop-criteria fast path (state flags, skipped device finish-reasons, expected `finish_by` calls) and fallback cases confirming `_can_use_host_stop_criteria` returns `False` for stop-words/draft requests.

Estimated code review effort: 4 (Complex) | ~45 minutes

Sequence Diagram(s)

sequenceDiagram
participant SampleAsync as sample_async
participant ProcessRequests as _process_requests
participant SampleState as SampleStateTorch
participant UpdateRequests as update_requests
participant StopCriteria as _handle_stop_criteria

SampleAsync->>ProcessRequests: invoke sampling
ProcessRequests->>ProcessRequests: compute use_host_stop_criteria
ProcessRequests-->>SampleAsync: return tokens, use_host_stop_criteria
SampleAsync->>SampleState: set use_host_stop_criteria flag
SampleAsync->>UpdateRequests: pass sample state
UpdateRequests->>UpdateRequests: check state.use_host_stop_criteria
alt host stop criteria enabled
UpdateRequests->>UpdateRequests: add sampled token to request
UpdateRequests->>StopCriteria: evaluate stop criteria on host
else disabled
UpdateRequests->>UpdateRequests: run device finish-reasons + draft-token processing
end

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description only contains the template and comments, with no filled-in issue description or test coverage details.	Add a real Description section explaining the problem and solution, plus Test Coverage listing the tests that cover the new host stop-criteria path.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise, specific, and accurately summarizes the main change of moving greedy stop checks to the host.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

tensorrt-cicd · 2026-07-03T10:42:37Z

PR_Github #57453 [ run ] triggered by Bot. Commit: cdbd8a3 Link to invocation

tensorrt-cicd · 2026-07-03T15:51:25Z

PR_Github #57453 [ run ] completed with state SUCCESS. Commit: cdbd8a3
/LLM/main/L0_MergeRequest_PR pipeline #46190 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

[None][perf] Move greedy stop checks to host

cdbd8a3

Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>

mingyangHao requested a review from a team as a code owner July 3, 2026 10:35

mingyangHao requested a review from achartier July 3, 2026 10:35

github-actions Bot assigned mingyangHao Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][perf] Move greedy stop checks to host#15920

[None][perf] Move greedy stop checks to host#15920
mingyangHao wants to merge 1 commit into
NVIDIA:mainfrom
mingyangHao:user/mingyangh/torch-sampler-host-stop

mingyangHao commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

mingyangHao commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mingyangHao commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

mingyangHao commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

tensorrt-cicd commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mingyangHao commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading