[PD Disaggregation] Support PD deployment of DeepSeekv3. #5251

K11OntheBoat · 2025-11-26T11:52:43Z

Motivation

支持DeepSeekv3的PD分离部署.

Modifications

适配 DeepSeekV3的权重加载.
修复 DeepSeekV3的FP8 权重动态量化Bug.
适配基于MLA结构模型的kv cache传输.

Usage or Command

参考PD分离部署文档.

Accuracy Tests

暂时拿5层模型进行了验证，能不报错地跑通1P1D.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-26T11:52:49Z

Thanks for your contribution!

CLAassistant · 2025-11-26T11:52:49Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

K11OntheBoat · 2025-11-26T11:55:30Z

fastdeploy/engine/engine.py

                            local_data_parallel_size=self.cfg.parallel_config.data_parallel_size,
                        )
                    )
+                    ctx = multiprocessing.get_context("spawn")


如果采用默认的fork方式，子进程会在MLA attention的初始化时的 paddle.device.cuda.get_device_properties() 报错.

codecov-commenter · 2025-11-26T13:16:25Z

Codecov Report

❌ Patch coverage is 41.17647% with 20 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@a12eaf9). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/prefix_cache_manager.py	25.00%	7 Missing and 2 partials ⚠️
fastdeploy/cache_manager/cache_messager.py	57.14%	2 Missing and 4 partials ⚠️
fastdeploy/model_executor/models/deepseek_v3.py	25.00%	3 Missing ⚠️
...he_manager/transfer_factory/rdma_cache_transfer.py	0.00%	1 Missing ⚠️
fastdeploy/engine/engine.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5251   +/-   ##
==========================================
  Coverage           ?   60.51%           
==========================================
  Files              ?      320           
  Lines              ?    39076           
  Branches           ?     5880           
==========================================
  Hits               ?    23646           
  Misses             ?    13558           
  Partials           ?     1872

Flag	Coverage Δ
GPU	`60.51% <41.17%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

examples/splitwise/stop.sh

fastdeploy/cache_manager/cache_messager.py

Copilot

Pull request overview

这个PR为DeepSeekv3模型添加了PD（Prefill-Decoding）分离部署支持，主要针对基于MLA（Multi-head Latent Attention）架构的模型特性进行了适配。

适配了DeepSeekV3的权重加载和FP8动态量化
实现了MLA结构模型的KV cache传输机制（支持仅传输key cache的模式）
修复了MOE层的权重初始化顺序问题

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
fastdeploy/config.py	添加moe_num_experts配置映射（存在重复代码）
fastdeploy/model_executor/models/deepseek_v3.py	添加empty_input_forward方法用于专家层预热，增加空白行改善格式
fastdeploy/model_executor/layers/moe/moe.py	调整weight_loader中参数初始化和范围检查的顺序
fastdeploy/engine/engine.py	使用spawn上下文创建多进程以支持CUDA环境
fastdeploy/cache_manager/transfer_factory/rdma_cache_transfer.py	将RDMA库加载失败从日志改为抛出异常
fastdeploy/cache_manager/prefix_cache_manager.py	改进value_cache_shape参数处理，支持MLA模型可能没有value cache的场景
fastdeploy/cache_manager/cache_messager.py	调整cache初始化顺序，优先检查value cache是否存在
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_rdma.cpp	实现has_value_cache_标志支持MLA模型的key-only缓存模式
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_connection.cpp	更新MR交换函数以支持可选的value cache
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/include/*.h	添加has_value_cache参数到函数签名和类成员
fastdeploy/model_executor/layers/attention/mla_attention_backend.py	添加kv_signal_data参数传递
fastdeploy/model_executor/layers/backends/metax/attention/mla_attn_metax_backend.py	添加kv_signal_data参数传递
custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu	添加kv_signal_data可选参数，支持PD分离部署信号传递
custom_ops/gpu_ops/cpp_extensions.cc	更新函数签名以包含kv_signal_data参数
custom_ops/gpu_ops/mla_attn/mla_hopper.cuh	代码格式化改进，添加对group_size=128的支持
examples/splitwise/stop.sh	注释掉redis-server的强制终止
fastdeploy/worker/gpu_model_runner.py	删除多余的空白行

fastdeploy/cache_manager/transfer_factory/rdma_cache_transfer.py

fastdeploy/model_executor/layers/moe/moe.py

fastdeploy/model_executor/models/deepseek_v3.py

examples/splitwise/stop.sh

fastdeploy/cache_manager/transfer_factory/kvcache_transfer/include/kvcache_rdma.h

custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu

fastdeploy/config.py

K11OntheBoat commented Nov 26, 2025

View reviewed changes

K11OntheBoat force-pushed the dsv3_pd_v1 branch from fc9e5d7 to 1a2f24d Compare November 26, 2025 12:03

juncaipeng reviewed Nov 27, 2025

View reviewed changes

examples/splitwise/stop.sh Show resolved Hide resolved

fastdeploy/cache_manager/cache_messager.py Show resolved Hide resolved

juncaipeng requested a review from Copilot November 27, 2025 02:26

Copilot started reviewing on behalf of juncaipeng November 27, 2025 02:26 View session

Copilot finished reviewing on behalf of juncaipeng November 27, 2025 02:29

Copilot AI reviewed Nov 27, 2025

View reviewed changes

K11OntheBoat added 2 commits November 27, 2025 15:10

Support deepseekv3 cache transfer for PD deploy

8f62918

clean some log info

5317ab2

K11OntheBoat force-pushed the dsv3_pd_v1 branch from 1a2f24d to 5317ab2 Compare November 27, 2025 07:10

K11OntheBoat requested a review from juncaipeng November 27, 2025 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PD Disaggregation] Support PD deployment of DeepSeekv3. #5251

[PD Disaggregation] Support PD deployment of DeepSeekv3. #5251

K11OntheBoat commented Nov 26, 2025

Uh oh!

paddle-bot bot commented Nov 26, 2025

Uh oh!

CLAassistant commented Nov 26, 2025

Uh oh!

K11OntheBoat Nov 26, 2025

Uh oh!

codecov-commenter commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[PD Disaggregation] Support PD deployment of DeepSeekv3. #5251

Are you sure you want to change the base?

[PD Disaggregation] Support PD deployment of DeepSeekv3. #5251

Conversation

K11OntheBoat commented Nov 26, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 26, 2025

Uh oh!

CLAassistant commented Nov 26, 2025

Uh oh!

K11OntheBoat Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Nov 26, 2025 •

edited

Loading