-
Notifications
You must be signed in to change notification settings - Fork 659
[PD Disaggregation] Support PD deployment of DeepSeekv3. #5251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
|
K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| local_data_parallel_size=self.cfg.parallel_config.data_parallel_size, | ||
| ) | ||
| ) | ||
| ctx = multiprocessing.get_context("spawn") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果采用默认的fork方式,子进程会在MLA attention的初始化时的 paddle.device.cuda.get_device_properties() 报错.
fc9e5d7 to
1a2f24d
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5251 +/- ##
==========================================
Coverage ? 60.51%
==========================================
Files ? 320
Lines ? 39076
Branches ? 5880
==========================================
Hits ? 23646
Misses ? 13558
Partials ? 1872
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
这个PR为DeepSeekv3模型添加了PD(Prefill-Decoding)分离部署支持,主要针对基于MLA(Multi-head Latent Attention)架构的模型特性进行了适配。
- 适配了DeepSeekV3的权重加载和FP8动态量化
- 实现了MLA结构模型的KV cache传输机制(支持仅传输key cache的模式)
- 修复了MOE层的权重初始化顺序问题
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/config.py | 添加moe_num_experts配置映射(存在重复代码) |
| fastdeploy/model_executor/models/deepseek_v3.py | 添加empty_input_forward方法用于专家层预热,增加空白行改善格式 |
| fastdeploy/model_executor/layers/moe/moe.py | 调整weight_loader中参数初始化和范围检查的顺序 |
| fastdeploy/engine/engine.py | 使用spawn上下文创建多进程以支持CUDA环境 |
| fastdeploy/cache_manager/transfer_factory/rdma_cache_transfer.py | 将RDMA库加载失败从日志改为抛出异常 |
| fastdeploy/cache_manager/prefix_cache_manager.py | 改进value_cache_shape参数处理,支持MLA模型可能没有value cache的场景 |
| fastdeploy/cache_manager/cache_messager.py | 调整cache初始化顺序,优先检查value cache是否存在 |
| fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_rdma.cpp | 实现has_value_cache_标志支持MLA模型的key-only缓存模式 |
| fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_connection.cpp | 更新MR交换函数以支持可选的value cache |
| fastdeploy/cache_manager/transfer_factory/kvcache_transfer/include/*.h | 添加has_value_cache参数到函数签名和类成员 |
| fastdeploy/model_executor/layers/attention/mla_attention_backend.py | 添加kv_signal_data参数传递 |
| fastdeploy/model_executor/layers/backends/metax/attention/mla_attn_metax_backend.py | 添加kv_signal_data参数传递 |
| custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu | 添加kv_signal_data可选参数,支持PD分离部署信号传递 |
| custom_ops/gpu_ops/cpp_extensions.cc | 更新函数签名以包含kv_signal_data参数 |
| custom_ops/gpu_ops/mla_attn/mla_hopper.cuh | 代码格式化改进,添加对group_size=128的支持 |
| examples/splitwise/stop.sh | 注释掉redis-server的强制终止 |
| fastdeploy/worker/gpu_model_runner.py | 删除多余的空白行 |
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/include/kvcache_rdma.h
Outdated
Show resolved
Hide resolved
1a2f24d to
5317ab2
Compare
Motivation
支持DeepSeekv3的PD分离部署.
Modifications
Usage or Command
参考PD分离部署文档.
Accuracy Tests
暂时拿5层模型进行了验证,能不报错地跑通1P1D.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.