[None][perf] serve: opt-in msgspec msgpack transport for disagg orchestrator->worker request body by Tabrizian · Pull Request #15910 · NVIDIA/TensorRT-LLM

Tabrizian · 2026-07-03T07:48:54Z

What

Opt-in msgspec msgpack transport for the disaggregated orchestrator→worker request body, as an alternative to the orjson request-body parse (#15690).

Instead of speeding up the JSON parse (orjson), this removes JSON (de)serialization from the disagg serving path entirely: the orchestrator encodes the forwarded request dict as msgpack via msgspec.msgpack, and the worker decodes it.

How

openai_client.py (orchestrator): when enabled, body = msgspec.msgpack.Encoder().encode(request.model_dump(mode='json', exclude_unset=True)) and POST Content-Type: application/msgpack (else the existing model_dump_json + application/json path).
openai_server.py (worker): an APIRoute/Request that decodes application/msgpack bodies with msgspec, falling back to stdlib json for anything else (content-type gated).
requirements.txt: add msgspec.

Enabling

TRTLLM_SERVE_ENABLE_MSGSPEC=1 on both the orchestrator and the worker. Off by default → the JSON path is byte-for-byte unchanged (no route override, no import).

Why

For DeepSeek-V4 agentic loads the ~40k-token request body's JSON (de)serialization on the single serving event loop is a dispatch bottleneck. #15690 addressed it with a faster JSON parser (orjson); this is the msgpack alternative, to A/B the two. Purely the orchestrator↔worker leg; the client API is unchanged.

…>worker request body Alternative to the orjson request-body parse (NVIDIA#15690): instead of speeding up the JSON parse, encode the orchestrator->worker forwarded body as msgpack via msgspec and decode it on the worker, removing JSON (de)serialization from the disagg serving path. Opt-in via TRTLLM_SERVE_ENABLE_MSGSPEC=1 (set on both orchestrator and worker); off by default the JSON path is byte-for-byte unchanged. - openai_client.py (orchestrator): when enabled, encode the request dict with msgspec.msgpack and POST Content-Type: application/msgpack. - openai_server.py (worker): decode application/msgpack bodies with msgspec, falling back to stdlib json otherwise (content-type gated). - requirements.txt: add msgspec. Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

…e json) FastAPI only sends the body through Request.json() when the Content-Type subtype is json/+json; application/msgpack bypassed it, so pydantic received the raw msgpack bytes ('bytes' object has no attribute 'get'). Keep Content-Type application/json and flag msgpack with X-TRTLLM-Msgpack: 1; the worker decodes with msgspec when that header is set. Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

Tabrizian requested review from a team as code owners July 3, 2026 07:48

Tabrizian requested review from hchings and removed request for a team July 3, 2026 07:48

github-actions Bot assigned Tabrizian Jul 3, 2026

lishicheng1996-nv mentioned this pull request Jul 3, 2026

[None][perf] serve: parse request bodies with orjson to unblock the serving loop #15690

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][perf] serve: opt-in msgspec msgpack transport for disagg orchestrator->worker request body#15910

[None][perf] serve: opt-in msgspec msgpack transport for disagg orchestrator->worker request body#15910
Tabrizian wants to merge 2 commits into
NVIDIA:feat/deepseek_v4from
Tabrizian:perf/disagg-msgspec

Tabrizian commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Tabrizian commented Jul 3, 2026

What

How

Enabling

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant