feat(serve): add --generation-config CLI for server sampling defaults by lvhan028 · Pull Request #4708 · InternLM/lmdeploy

lvhan028 · 2026-06-25T08:58:49Z

No description provided.

Align api_server with vLLM by loading HuggingFace generation_config.json as default sampling params, with optional override and lmdeploy fallback. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Adds a server-side “generation config” mechanism to centralize sampling defaults (and an optional server-wide max_new_tokens cap) for the serving stack, wiring it through OpenAI/Responses/Anthropic request handling and exposing it via new CLI flags.

Changes:

Introduces lmdeploy.serve.core.generation_config helpers to load HF generation_config.json, merge request/server defaults, and build GenerationConfig.
Updates OpenAI/Responses/Anthropic serving code to use merged sampling defaults and adjusts protocol model defaults to None so request fields are only applied when explicitly provided.
Adds CLI flags --generation-config and --override-generation-config plus unit tests for the merge/resolution logic.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/serve/test_generation_config.py	Adds unit tests for sampling merge, max token resolution, and server-default resolution.
lmdeploy/serve/openai/serving_completion.py	Validates sampling values after merging request/server/fallback defaults.
lmdeploy/serve/openai/serving_chat_completion.py	Same as above for chat-completions validation.
lmdeploy/serve/openai/responses/serving.py	Passes server defaults/cap into Responses `to_generation_config`.
lmdeploy/serve/openai/responses/request.py	Rebuilds Responses `GenerationConfig` via shared generation-config helpers.
lmdeploy/serve/openai/responses/protocol.py	Sets certain sampling fields to default `None` to enable server defaults.
lmdeploy/serve/openai/protocol.py	Sets multiple sampling-related request defaults to `None` to enable server defaults.
lmdeploy/serve/openai/api_server.py	Centralizes `GenerationConfig` construction and wires server sampling defaults/cap into request handling.
lmdeploy/serve/core/generation_config.py	New core module implementing config loading, merging, and `GenerationConfig` building.
lmdeploy/serve/anthropic/protocol.py	Sets temperature default to `None` to enable server defaults.
lmdeploy/serve/anthropic/endpoints/messages.py	Passes server defaults/cap into Anthropic `to_generation_config`.
lmdeploy/serve/anthropic/adapter.py	Rebuilds Anthropic `GenerationConfig` via shared generation-config helpers.
lmdeploy/cli/utils.py	Adds CLI args for generation config source and overrides.
lmdeploy/cli/serve.py	Wires new CLI args through to server launch entrypoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    override = override or {}
+    src = generation_config


+    override_max_new_tokens = sampling.pop('max_new_tokens', None)
+    if override_max_new_tokens is not None:
+        override_max_new_tokens = int(override_max_new_tokens)
+


+    request_value = max_completion_tokens if max_completion_tokens is not None else max_tokens
+    if request_value is None:
+        return server_cap
+    if server_cap is not None:
+        return min(request_value, server_cap)


feat(serve): add --generation-config CLI for server sampling defaults

67b7621

Align api_server with vLLM by loading HuggingFace generation_config.json as default sampling params, with optional override and lmdeploy fallback. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI review requested due to automatic review settings June 25, 2026 08:58

lvhan028 added the improvement label Jun 25, 2026

Copilot started reviewing on behalf of lvhan028 June 25, 2026 08:59 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(serve): add --generation-config CLI for server sampling defaults#4708

feat(serve): add --generation-config CLI for server sampling defaults#4708
lvhan028 wants to merge 1 commit into
InternLM:mainfrom
lvhan028:feat/generation-config-cli

lvhan028 commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lvhan028 commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants