Skip to content

Include context_length in /v1/models response (#1183)#1184

Open
seikixtc wants to merge 1 commit intoml-explore:mainfrom
seikixtc:1183-context-length
Open

Include context_length in /v1/models response (#1183)#1184
seikixtc wants to merge 1 commit intoml-explore:mainfrom
seikixtc:1183-context-length

Conversation

@seikixtc
Copy link
Copy Markdown

Closes #1183.

What

Adds a context_length field to every entry returned by the /v1/models endpoint (and the /v1/models/{repo_id} single-model variant), reporting the maximum context length the model declares in its config.json.

Example response after this change:

{
  "object": "list",
  "data": [
    {
      "id": "mlx-community/Qwen2.5-7B-Instruct-4bit",
      "object": "model",
      "created": 1745000000,
      "context_length": 32768
    }
  ]
}

Why

OpenAI-compatible clients increasingly need to know a model's usable context window before constructing a request — for chunking long documents, sizing prompt caches, choosing between models, and rendering token-budget UIs. Today the endpoint exposes only id, object, and created, so clients have to hard-code limits or guess.

How

  • Adds _get_context_length(config_path) to read the first recognized max-context field from config.json
  • Supports max_position_embeddings, n_positions, max_sequence_length, and seq_length
  • Adds _find_repo_config_path(repo) so Hugging Face cache scans use the top-level snapshot config.json rather than any nested file with the same basename
  • Returns None instead of raising if the config is missing, malformed, or has no valid positive integer context field
  • Includes context_length in both cached-model listings and local --model listings

Tests

  • New unit tests for _get_context_length cover recognized fields, priority ordering, missing fields, malformed JSON, missing files, non-positive values, string values, bool values, and None input
  • New unit tests for _find_repo_config_path cover top-level-vs-nested config.json selection in Hugging Face cache snapshots
  • test_handle_models now asserts every model entry includes context_length, with value either None or a positive integer

Verification

  • python3 -m unittest discover -s tests -p 'test_server.py' -v
  • pre-commit run --files mlx_lm/server.py tests/test_server.py
  • git diff --check

Backward compatibility

Purely additive. Existing clients that only read id / object / created continue to work unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: include context_length in /v1/models response

1 participant