-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Open
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
This issue proposes adding a lightweight P/D disaggregation deployment mode for vLLM on Ray.
Motivation
- Provide a Ray-native deployment option for vLLM that cleanly separates Prefill (P) nodes and Decode (D) nodes.
Improve flexibility in scheduling and resource utilization on heterogeneous Ray clusters (e.g., different GPU types, CPU-only nodes). - Keep the solution lightweight and easy to operate, without introducing heavy additional components or complex orchestration.
High-level idea
Introduce a P/D-disaggregated deployment mode built on top of Ray, with specialized P and D nodes for compute-bound vs latency-bound workloads.
Design the deployment so that:
- It is easy to configure and launch on an existing Ray cluster.
- It reuses as much of the existing vLLM infrastructure as possible.
- It remains compatible with the new router and the proxy implementation in the vLLM repository.
Why this is useful
- Lightweight deployment: minimal extra services and configuration; suitable for users who want elastic, distributed vLLM on Ray but don’t want to adopt a heavy multi-component stack.
- Better resource utilization: P and D roles can be scheduled independently, matching different node types and improving cluster efficiency.
- Incremental adoption: users who already run vLLM on Ray can adopt this mode with relatively small changes.
If this direction aligns with the project’s roadmap, I’d be happy to open a PR implementing this deployment mode and iterate based on your feedback.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request