Skip to content

[Feature]: Add P/D disaggregation deployment on Ray #29651

@JackyMa1997

Description

@JackyMa1997

🚀 The feature, motivation and pitch

This issue proposes adding a lightweight P/D disaggregation deployment mode for vLLM on Ray.

Motivation

  • Provide a Ray-native deployment option for vLLM that cleanly separates Prefill (P) nodes and Decode (D) nodes.
    Improve flexibility in scheduling and resource utilization on heterogeneous Ray clusters (e.g., different GPU types, CPU-only nodes).
  • Keep the solution lightweight and easy to operate, without introducing heavy additional components or complex orchestration.

High-level idea

Introduce a P/D-disaggregated deployment mode built on top of Ray, with specialized P and D nodes for compute-bound vs latency-bound workloads.

Design the deployment so that:

  • It is easy to configure and launch on an existing Ray cluster.
  • It reuses as much of the existing vLLM infrastructure as possible.
  • It remains compatible with the new router and the proxy implementation in the vLLM repository.

Why this is useful

  • Lightweight deployment: minimal extra services and configuration; suitable for users who want elastic, distributed vLLM on Ray but don’t want to adopt a heavy multi-component stack.
  • Better resource utilization: P and D roles can be scheduled independently, matching different node types and improving cluster efficiency.
  • Incremental adoption: users who already run vLLM on Ray can adopt this mode with relatively small changes.
    If this direction aligns with the project’s roadmap, I’d be happy to open a PR implementing this deployment mode and iterate based on your feedback.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions