[None][perf] Move greedy stop checks to host#15920
Conversation
Signed-off-by: Mingyang Hao <200044211+mingyangHao@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds an optional host-side stop-criteria evaluation path for single-token, single-beam fast-greedy sampling in TorchSampler. A new ChangesHost stop criteria for fast-greedy sampling
Estimated code review effort: 4 (Complex) | ~45 minutes Sequence Diagram(s)sequenceDiagram
participant SampleAsync as sample_async
participant ProcessRequests as _process_requests
participant SampleState as SampleStateTorch
participant UpdateRequests as update_requests
participant StopCriteria as _handle_stop_criteria
SampleAsync->>ProcessRequests: invoke sampling
ProcessRequests->>ProcessRequests: compute use_host_stop_criteria
ProcessRequests-->>SampleAsync: return tokens, use_host_stop_criteria
SampleAsync->>SampleState: set use_host_stop_criteria flag
SampleAsync->>UpdateRequests: pass sample state
UpdateRequests->>UpdateRequests: check state.use_host_stop_criteria
alt host stop criteria enabled
UpdateRequests->>UpdateRequests: add sampled token to request
UpdateRequests->>StopCriteria: evaluate stop criteria on host
else disabled
UpdateRequests->>UpdateRequests: run device finish-reasons + draft-token processing
end
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
PR_Github #57453 [ run ] triggered by Bot. Commit: |
|
PR_Github #57453 [ run ] completed with state
|
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.