Skip to content

Conversation

@K11OntheBoat
Copy link
Collaborator

Motivation

支持DeepSeekv3的PD分离部署.

Modifications

  1. 适配 DeepSeekV3的权重加载.
  2. 修复 DeepSeekV3的FP8 权重动态量化Bug.
  3. 适配 基于MLA结构模型的kv cache传输.

Usage or Command

参考PD分离部署文档.

Accuracy Tests

暂时拿5层模型进行了验证,能不报错地跑通1P1D.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 26, 2025

Thanks for your contribution!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

local_data_parallel_size=self.cfg.parallel_config.data_parallel_size,
)
)
ctx = multiprocessing.get_context("spawn")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果采用默认的fork方式,子进程会在MLA attention的初始化时的 paddle.device.cuda.get_device_properties() 报错.

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 41.17647% with 20 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@a12eaf9). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/cache_manager/prefix_cache_manager.py 25.00% 7 Missing and 2 partials ⚠️
fastdeploy/cache_manager/cache_messager.py 57.14% 2 Missing and 4 partials ⚠️
fastdeploy/model_executor/models/deepseek_v3.py 25.00% 3 Missing ⚠️
...he_manager/transfer_factory/rdma_cache_transfer.py 0.00% 1 Missing ⚠️
fastdeploy/engine/engine.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5251   +/-   ##
==========================================
  Coverage           ?   60.51%           
==========================================
  Files              ?      320           
  Lines              ?    39076           
  Branches           ?     5880           
==========================================
  Hits               ?    23646           
  Misses             ?    13558           
  Partials           ?     1872           
Flag Coverage Δ
GPU 60.51% <41.17%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juncaipeng juncaipeng requested a review from Copilot November 27, 2025 02:26
Copilot finished reviewing on behalf of juncaipeng November 27, 2025 02:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

这个PR为DeepSeekv3模型添加了PD(Prefill-Decoding)分离部署支持,主要针对基于MLA(Multi-head Latent Attention)架构的模型特性进行了适配。

  • 适配了DeepSeekV3的权重加载和FP8动态量化
  • 实现了MLA结构模型的KV cache传输机制(支持仅传输key cache的模式)
  • 修复了MOE层的权重初始化顺序问题

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fastdeploy/config.py 添加moe_num_experts配置映射(存在重复代码)
fastdeploy/model_executor/models/deepseek_v3.py 添加empty_input_forward方法用于专家层预热,增加空白行改善格式
fastdeploy/model_executor/layers/moe/moe.py 调整weight_loader中参数初始化和范围检查的顺序
fastdeploy/engine/engine.py 使用spawn上下文创建多进程以支持CUDA环境
fastdeploy/cache_manager/transfer_factory/rdma_cache_transfer.py 将RDMA库加载失败从日志改为抛出异常
fastdeploy/cache_manager/prefix_cache_manager.py 改进value_cache_shape参数处理,支持MLA模型可能没有value cache的场景
fastdeploy/cache_manager/cache_messager.py 调整cache初始化顺序,优先检查value cache是否存在
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_rdma.cpp 实现has_value_cache_标志支持MLA模型的key-only缓存模式
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_connection.cpp 更新MR交换函数以支持可选的value cache
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/include/*.h 添加has_value_cache参数到函数签名和类成员
fastdeploy/model_executor/layers/attention/mla_attention_backend.py 添加kv_signal_data参数传递
fastdeploy/model_executor/layers/backends/metax/attention/mla_attn_metax_backend.py 添加kv_signal_data参数传递
custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu 添加kv_signal_data可选参数,支持PD分离部署信号传递
custom_ops/gpu_ops/cpp_extensions.cc 更新函数签名以包含kv_signal_data参数
custom_ops/gpu_ops/mla_attn/mla_hopper.cuh 代码格式化改进,添加对group_size=128的支持
examples/splitwise/stop.sh 注释掉redis-server的强制终止
fastdeploy/worker/gpu_model_runner.py 删除多余的空白行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants