-
Notifications
You must be signed in to change notification settings - Fork 659
[Feature] support reward model #5216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5216 +/- ##
==========================================
Coverage ? 60.99%
==========================================
Files ? 317
Lines ? 38831
Branches ? 5856
==========================================
Hits ? 23685
Misses ? 13278
Partials ? 1868
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
需要补充一下pooling model部署的使用文档并完善参数说明(包括启动参数,请求参数) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
这个 PR 添加了对 reward 模型的支持,扩展了 FastDeploy 的能力以支持奖励模型评分。该功能允许模型评估用户-助手对话对的质量。
主要变更包括:
- 在配置系统中添加 "reward" 作为新的任务类型和转换选项
- 实现 reward 模型的池化逻辑,支持 LAST pooling 类型
- 添加新的
/v1/rewardAPI 端点和相应的测试
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
tests/pooling/test_Qwen3-Embedding_serving.py |
将模型路径未找到时的行为从 skip 改为 raise 错误 |
tests/pooling/test_Ernie4_5_reward_serving.py |
添加 reward 模型服务的完整测试套件,包括基线比较 |
fastdeploy/worker/gpu_model_runner.py |
修改池化逻辑以支持 reward 模型,添加对池化模型的 max_tokens 处理 |
fastdeploy/model_executor/pre_and_post_process.py |
添加空值检查以处理部分 pooler 输出 |
fastdeploy/model_executor/models/model_base.py |
将 REWARD 类别添加到池化模型支持中 |
fastdeploy/model_executor/models/ernie_vl_rm.py |
实现 reward 模型,添加多任务池化支持,切换到 LAST pooling |
fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py |
为 reward 模型添加 float32 normalization 和权重映射 |
fastdeploy/model_executor/models/adapters.py |
在 adapter 中添加 reward pooler |
fastdeploy/model_executor/model_loader/default_loader_v1.py |
将 reward 类型转换映射到 embedding 模型 |
fastdeploy/model_executor/layers/pooler.py |
添加 reward 池化支持,更新 pooler 工厂方法和 StepPooler |
fastdeploy/model_executor/layers/pool/metadata.py |
更新 device 参数类型从字符串改为 Place 对象 |
fastdeploy/entrypoints/openai/serving_engine.py |
从 chat template kwargs 中移除 add_generation_prompt |
fastdeploy/entrypoints/openai/protocol.py |
从 EmbeddingChatRequest 中移除 add_generation_prompt 字段 |
fastdeploy/engine/request.py |
为池化请求自动禁用 thinking 模式 |
fastdeploy/engine/pooling_params.py |
将 reward 任务的默认 normalize 设置为 False |
fastdeploy/engine/args_utils.py |
为池化 runner 禁用 prefix caching |
fastdeploy/config.py |
添加 reward 任务类型定义和配置映射 |
| for seq_len, prompt_len in zip(seq_lens_cpu, pooling_metadata.prompt_lens): | ||
| output = raw_pooler_output[0].data if int(seq_len) == int(prompt_len) else None |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has a critical bug. The original code iterated over both raw_pooler_output and other variables in the zip, extracting .data from each raw_output element. The modified code only accesses raw_pooler_output[0].data for all iterations, meaning all pooler outputs will incorrectly reference the same first element.
Additionally, if raw_pooler_output is a list of tensors (which is what the head returns), then raw_pooler_output[0] would be a tensor, and .data is not a valid attribute on PaddlePaddle tensors.
建议恢复原来的迭代逻辑:
for raw_output, seq_len, prompt_len in zip(raw_pooler_output, seq_lens_cpu, pooling_metadata.prompt_lens):
output = raw_output if int(seq_len) == int(prompt_len) else None
pooler_output.append(output)| for seq_len, prompt_len in zip(seq_lens_cpu, pooling_metadata.prompt_lens): | |
| output = raw_pooler_output[0].data if int(seq_len) == int(prompt_len) else None | |
| for raw_output, seq_len, prompt_len in zip(raw_pooler_output, seq_lens_cpu, pooling_metadata.prompt_lens): | |
| output = raw_output if int(seq_len) == int(prompt_len) else None |
| num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens) | ||
| cumsum = paddle.zeros([n_seq + 1], dtype="int64") | ||
| num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens, dtype="int64", place=paddle.CPUPlace()) | ||
| cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace()) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.zeros does not accept a device parameter. The correct parameter name is place. This will cause a runtime error.
应该改为:
cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace())| cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace()) | |
| cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens) | ||
| cumsum = paddle.zeros([n_seq + 1], dtype="int64") | ||
| num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens, dtype="int64", place=paddle.CPUPlace()) | ||
| cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace()) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.to_tensor does not accept a device parameter. The correct parameter name is place.
应该改为:
num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens, dtype="int64", place=paddle.CPUPlace())| cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace()) | |
| cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| @pytest.fixture | ||
| def consistent_payload(): | ||
| """ | ||
| Returns a fixed payload for reward model consistency testing. | ||
| Reward models evaluate user-assistant conversation pairs. | ||
| """ | ||
| return { | ||
| "model": "default", | ||
| "messages": [ | ||
| {"role": "user", "content": [{"type": "text", "text": "北京天安门在哪里?"}]}, | ||
| { | ||
| "role": "assistant", | ||
| "content": [{"type": "text", "text": "北京天安门在中国北京故宫的前面。"}], | ||
| }, | ||
| ], | ||
| "user": "test-user-123", | ||
| } | ||
|
|
||
|
|
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixture consistent_payload is defined but never used in any test. If it's intended for future use, consider adding a comment explaining its purpose. Otherwise, it should be removed to reduce code clutter.
如果不需要该 fixture,建议删除它以简化代码。
| @pytest.fixture | |
| def consistent_payload(): | |
| """ | |
| Returns a fixed payload for reward model consistency testing. | |
| Reward models evaluate user-assistant conversation pairs. | |
| """ | |
| return { | |
| "model": "default", | |
| "messages": [ | |
| {"role": "user", "content": [{"type": "text", "text": "北京天安门在哪里?"}]}, | |
| { | |
| "role": "assistant", | |
| "content": [{"type": "text", "text": "北京天安门在中国北京故宫的前面。"}], | |
| }, | |
| ], | |
| "user": "test-user-123", | |
| } | |
| # (lines 157–174 removed; no replacement needed) |
|
|
||
| print(f"\n=== Sending request to {reward_api_url} ===") | ||
|
|
||
| # 发送HTTP请求 |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该行注释是中文,但代码库中其他注释使用英文。建议统一使用英文注释以保持代码风格一致。
建议改为:# Send HTTP request
| # 发送HTTP请求 | |
| # Send HTTP request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| for pid in output.splitlines(): | ||
| os.kill(int(pid), signal.SIGKILL) | ||
| print(f"Killed process on port {port}, pid={pid}") | ||
| except subprocess.CalledProcessError: |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except subprocess.CalledProcessError: | |
| except subprocess.CalledProcessError: | |
| # No process is listening on the port, so nothing to kill. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…into support_pooling_5
Motivation
支持reward模型
Modifications
支持reward模型。
Usage or Command
服务启动方式:
请求方式:
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.