[None][perf] serve: opt-in msgspec msgpack transport for disagg orchestrator->worker request body#15910
Open
Tabrizian wants to merge 2 commits into
Open
[None][perf] serve: opt-in msgspec msgpack transport for disagg orchestrator->worker request body#15910Tabrizian wants to merge 2 commits into
Tabrizian wants to merge 2 commits into
Conversation
…>worker request body Alternative to the orjson request-body parse (NVIDIA#15690): instead of speeding up the JSON parse, encode the orchestrator->worker forwarded body as msgpack via msgspec and decode it on the worker, removing JSON (de)serialization from the disagg serving path. Opt-in via TRTLLM_SERVE_ENABLE_MSGSPEC=1 (set on both orchestrator and worker); off by default the JSON path is byte-for-byte unchanged. - openai_client.py (orchestrator): when enabled, encode the request dict with msgspec.msgpack and POST Content-Type: application/msgpack. - openai_server.py (worker): decode application/msgpack bodies with msgspec, falling back to stdlib json otherwise (content-type gated). - requirements.txt: add msgspec. Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
…e json)
FastAPI only sends the body through Request.json() when the Content-Type subtype
is json/+json; application/msgpack bypassed it, so pydantic received the raw
msgpack bytes ('bytes' object has no attribute 'get'). Keep Content-Type
application/json and flag msgpack with X-TRTLLM-Msgpack: 1; the worker decodes
with msgspec when that header is set.
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Opt-in
msgspecmsgpack transport for the disaggregated orchestrator→worker request body, as an alternative to the orjson request-body parse (#15690).Instead of speeding up the JSON parse (orjson), this removes JSON (de)serialization from the disagg serving path entirely: the orchestrator encodes the forwarded request dict as msgpack via
msgspec.msgpack, and the worker decodes it.How
openai_client.py(orchestrator): when enabled,body = msgspec.msgpack.Encoder().encode(request.model_dump(mode='json', exclude_unset=True))and POSTContent-Type: application/msgpack(else the existingmodel_dump_json+application/jsonpath).openai_server.py(worker): anAPIRoute/Requestthat decodesapplication/msgpackbodies withmsgspec, falling back to stdlibjsonfor anything else (content-type gated).requirements.txt: addmsgspec.Enabling
TRTLLM_SERVE_ENABLE_MSGSPEC=1on both the orchestrator and the worker. Off by default → the JSON path is byte-for-byte unchanged (no route override, no import).Why
For DeepSeek-V4 agentic loads the ~40k-token request body's JSON (de)serialization on the single serving event loop is a dispatch bottleneck. #15690 addressed it with a faster JSON parser (orjson); this is the msgpack alternative, to A/B the two. Purely the orchestrator↔worker leg; the client API is unchanged.