Skip to content

Support output_logits='generation' and output_last_hidden_state in PyTorch backend#4534

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/fix-logits-last-hidden-state-issue
Draft

Support output_logits='generation' and output_last_hidden_state in PyTorch backend#4534
Copilot wants to merge 4 commits intomainfrom
copilot/fix-logits-last-hidden-state-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 17, 2026

Motivation

The PyTorch engine silently discarded output_logits='generation' and output_last_hidden_state requests, returning None regardless. Only output_logits='all' with max_new_tokens=0 was supported. This PR enables per-step logit and hidden-state collection during generation.

Modification

  • messages.py: Add out_logits_mode / out_last_hidden_states_mode to SamplingParam; update from_gen_config to pass 'generation' mode through (keep 'all' restricted to max_new_tokens=0). Add HistoryHiddenStates class (same int16 bit-reinterpretation storage as HistoryLogits) plus all_hidden_states, return_hidden_states, hidden_states_generation_mode, hidden_states, and append_hidden_states on SchedulerSequence.
  • engine.py: Add last_hidden_state field to InferOutput.
  • model_agent/agent.py: Add hidden_states to BatchedOutputs; update _async_model_forward to extract last-position hidden states ('generation') or full-sequence hidden states ('all'); thread return_hidden_states / hidden_states_all_mode through _async_step and _step_postprocess_with_output.
  • inputs_maker.py: Compute return_hidden_states and hidden_states_all_mode flags per batch.
  • engine_loop.py: In _make_infer_outputs, accumulate last-position logits/hidden-states at each step for 'generation' mode and emit on finish; handle 'all' mode split by sequence length. Include last_hidden_state in _send_resp.
  • engine_instance.py: Read last_hidden_state from response data and forward to EngineOutput.

Use cases (Optional)

from lmdeploy import pipeline, GenerationConfig

pipe = pipeline('Qwen/Qwen3-VL-7B-Instruct')
gen_config = GenerationConfig(
    temperature=0.0,
    top_k=1,
    output_logits='generation',
    output_last_hidden_state='generation',
    max_new_tokens=128,
)
responses = pipe(['Hi, introduce yourself', 'Shanghai is'], gen_config=gen_config)
hidden_states = [r.last_hidden_state for r in responses]  # [num_steps, hidden_dim]
logits = [r.logits for r in responses]                    # [num_steps, vocab_size]

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Copilot AI and others added 3 commits April 17, 2026 07:46
Copilot AI changed the title [WIP] Fix logits and last_hidden_state returns for qwen3VL Support output_logits='generation' and output_last_hidden_state in PyTorch backend Apr 17, 2026
Copilot AI requested a review from CUHKSZzxy April 17, 2026 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 为什么qwen3VL的logits、last_hidden_state不支持返回呢?

2 participants