Open
Description
Motivation.
This RFC is used for trace the community work of official doc improvement. Currently, there are 3 important section need update.
- Tutorials
- User doc
- Developer doc
I'll list all the work items below. Everyone is welcome to take the task.
Proposed Change.
Tutorials
Problem:
- Moe model guide is missing. For example Qwen3 Moe
- No detial parallel case example
- QwQ is not very popular.
- V1 Engine should be used by default
Propose Change:
After improvment. The content be more clear(required : 313T+64GB):
- Single NPU(Qwen3-8B) aclgraph mode + eager mode @leo-pony
- Single NPU(Qwen2.5-VL-7B)eager mode @shen-shanshan
- Single NPU(Qwen2.5-audio)eager mode @shen-shanshan
- Single NPU(Qwen3 8B embedding)eager mode @wangxiyuan
- Multi NPU 2 card (Qwen3 MOE-30B) aclgraph mode + TP2 @leo-pony
- Multi NPU 4 card (Qwen3 32B) aclgraph mode + TP2 + DP2 + W8A8(optional) @22dimensions
- Multi Node 2 node (DeepSeek V3 0528 W8A8) TP8+DP2 Graph mode @Potabk
- Multi Node 4 node (DeepSeek R1) TP8+DP4 Graph mode @MengqingCao
- Multi Node 8 node (DeepSeek V3 0528 ) 1P1D Graph mode @wangxiyuan
User Guide
user guide should contain the usage for vLLM Ascend
Problem:
- A lot of usage is missing.
Propose Change:
-
Feature Support Matrix (Need refresh)
- index
- Graph mode Guide (Need refresh) @wangxiyuan
- Quantization Guide Add user guide for quantization #1206 @22dimensions
- what
- use case
- how
- Disaggregated prefill Guide @zhangxinyuehfad
- EP Guide (for example, how to use MC2、duel batch、EPLB...) @MengqingCao
- Sleep Mode Guide @Potabk [Doc] Add sleep mode doc #1295
- Guided Decoding Guide @shen-shanshan
- Spec Decode Guide @mengwei805
- Lora Guide @paulyu12
- VLLM supported feature(func call ...) @wangxiyuan
-
Model Support Matrix (Need refresh) @zhangxinyuehfad
-
Environment Vars (Need refresh) @shen-shanshan
I think we should change the shown way. Just like Additoinal config, to make it more clear by hand. -
Additional Config (Need refresh) @wangxiyuan
-
Release note
Developer Guide
Problem:
- There is no feature or code guide for developers at all.
Propose Change:
- How to contribute (Need refresh) @Yikun
- Version Policy (Need refresh) @Yikun
- CI system @Yikun
- Test Guide @MengqingCao
- Design Documents
- Patch @wangxiyuan
- Code Architecture @MengqingCao
- Ops and Custom ops @Yikun
- pta ops
- custom ops
- fused moe - Modeling @shen-shanshan
what model is added/patched, why
how to add a new model [docs] Update guidance on how to implement and register new models #1126 - Attention @wangxiyuan
- default attention
- mla - Communicator @leo-pony
- Disaggregated Prefill @ganyi1996ppo
- Graph mode @zzzzwwjj
- Quantization @22dimensions
- Evaluation (Need refresh) @zhangxinyuehfad
- release guide @wangxiyuan