-
Notifications
You must be signed in to change notification settings - Fork 659
[BugFix] fix mm cudagraph #5266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
这个PR修复了多模态(multi-modal)模型同时启用prefix cache和CUDAGraph时出现的cuda error 700问题。通过移除之前强制禁用prefix caching的限制代码,允许这两个特性可以同时工作。
Key Changes
- 移除了在
fastdeploy/config.py中当同时启用多模态和CUDAGraph时强制禁用prefix caching的逻辑
| else: | ||
| # It will hang when real batch_size < tp_size | ||
| self.graph_opt_config.filter_capture_size(tp_size=self.parallel_config.tensor_parallel_size) | ||
| if self.model_config.enable_mm and self.graph_opt_config.use_cudagraph: | ||
| self.cache_config.enable_prefix_caching = False | ||
| logger.info("Multi-modal models do not support prefix caching when using CUDAGraph!") | ||
|
|
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议在PR描述中补充更详细的信息:
- 根本原因分析:为什么之前会有cuda error 700错误?是什么改动修复了这个问题?
- 修复方案说明:为什么移除这个限制就可以解决问题?是否有其他地方的改动配合这个修复?
- 测试验证:如何验证这个修复是有效的?是否在特定的配置下进行了测试?
更完善的描述有助于未来的维护者理解这个改动的背景和原因。
| else: | ||
| # It will hang when real batch_size < tp_size | ||
| self.graph_opt_config.filter_capture_size(tp_size=self.parallel_config.tensor_parallel_size) | ||
| if self.model_config.enable_mm and self.graph_opt_config.use_cudagraph: | ||
| self.cache_config.enable_prefix_caching = False | ||
| logger.info("Multi-modal models do not support prefix caching when using CUDAGraph!") | ||
|
|
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bugfix removes a restriction that was preventing the combination of multi-modal models, prefix caching, and CUDAGraph. While there are existing tests for multi-modal with CUDAGraph (e.g., test_paddleocr_vl_serving.py, test_EB_VL_Lite_sot_serving.py), there doesn't appear to be a test that specifically validates this three-way combination (enable_mm + use_cudagraph + enable_prefix_caching) to prevent regression of the cuda error 700 issue.
Consider adding a test case that explicitly enables all three features together to ensure this bugfix works as expected and to prevent future regressions.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5266 +/- ##
==========================================
Coverage ? 61.01%
==========================================
Files ? 317
Lines ? 38799
Branches ? 5846
==========================================
Hits ? 23673
Misses ? 13263
Partials ? 1863
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
修复多模perfix cache 和 cudagraph 同时开启时,cuda error 700问题
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.