Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Jan 23, 2026

Purpose

  1. adding a num_output_tokens function, to avoid len(self.output_token_ids) might do a slice in SlowIncrementalDetokenizer
class SlowIncrementalDetokenizer(BaseIncrementalDetokenizer):
    @property
    def output_token_ids(self) -> list[int]:
        return (
            self.token_ids
            if not self.prompt_len
            else (self.token_ids[self.prompt_len :])
        )
  1. accumulate pieces and join once instead of string add

Test

Should be covered in unit test

  tests/tokenizers_/test_detokenize.py \
  tests/detokenizer/test_min_tokens.py \
  tests/detokenizer/test_stop_string_while_stop_model_terminates.py \
  tests/v1/engine/test_fast_incdec_prefix_err.py \
  tests/entrypoints/openai/test_serving_tokens.py

CC: @WoosukKwon @njhill

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 23, 2026
@mergify mergify bot added the v1 label Jan 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces performance optimizations to the detokenizer logic. The main changes include adding a num_output_tokens method to avoid creating intermediate list slices when only the length is needed, and accumulating string pieces in a list before performing a single join operation to prevent inefficient repeated string concatenations. These changes directly address the performance goals outlined in the PR description and are well-implemented. The TODO comment regarding inefficiency in BaseIncrementalDetokenizer.update is correctly resolved by the new string accumulation logic. The use of num_output_tokens is consistently applied where appropriate, replacing len(self.output_token_ids) to avoid unnecessary slicing. Overall, the changes improve efficiency without introducing new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants