Skip to content

Conversation

@lizexu123
Copy link
Collaborator

@lizexu123 lizexu123 commented Nov 25, 2025

Motivation

支持reward模型

Modifications

支持reward模型。

Usage or Command

服务启动方式:

python -m fastdeploy.entrypoints.openai.api_server \
    --model ${model_path} \
    --max-num-seqs 256 \
    --max-model-len 32768 \
    --port 13351 \
    --engine-worker-queue-port 7562 \
    --metrics-port 7531 \
    --tensor-parallel-size 8 \
    --gpu-memory-utilization 0.9 \
    --graph-optimization-config '{"use_cudagraph":false}' \
    --quantization "wint8" \
    --load-choices "default_v1" \
    --runner pooling \
    --convert reward

请求方式:

curl --location 'http://0.0.0.0:13351/v1/reward' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "default",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "北京天安门在哪里?"
                }
            ]
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "北京天安门在中国北京故宫的前面。"
                }
            ]
        }
    ],
    "user": "user-123"
}'

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 25, 2025

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 58.33333% with 20 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@cfc5b0c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/models/ernie_vl_rm.py 16.66% 5 Missing ⚠️
fastdeploy/model_executor/layers/pooler.py 66.66% 3 Missing ⚠️
fastdeploy/engine/request.py 66.66% 1 Missing and 1 partial ⚠️
fastdeploy/model_executor/layers/pool/metadata.py 75.00% 1 Missing and 1 partial ⚠️
...del_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py 33.33% 1 Missing and 1 partial ⚠️
fastdeploy/model_executor/pre_and_post_process.py 0.00% 1 Missing and 1 partial ⚠️
fastdeploy/worker/gpu_model_runner.py 60.00% 1 Missing and 1 partial ⚠️
fastdeploy/engine/pooling_params.py 0.00% 1 Missing ⚠️
...y/model_executor/model_loader/default_loader_v1.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5216   +/-   ##
==========================================
  Coverage           ?   60.99%           
==========================================
  Files              ?      317           
  Lines              ?    38831           
  Branches           ?     5856           
==========================================
  Hits               ?    23685           
  Misses             ?    13278           
  Partials           ?     1868           
Flag Coverage Δ
GPU 60.99% <58.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun
Copy link
Collaborator

需要补充一下pooling model部署的使用文档并完善参数说明(包括启动参数,请求参数)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

这个 PR 添加了对 reward 模型的支持,扩展了 FastDeploy 的能力以支持奖励模型评分。该功能允许模型评估用户-助手对话对的质量。

主要变更包括:

  • 在配置系统中添加 "reward" 作为新的任务类型和转换选项
  • 实现 reward 模型的池化逻辑,支持 LAST pooling 类型
  • 添加新的 /v1/reward API 端点和相应的测试

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/pooling/test_Qwen3-Embedding_serving.py 将模型路径未找到时的行为从 skip 改为 raise 错误
tests/pooling/test_Ernie4_5_reward_serving.py 添加 reward 模型服务的完整测试套件,包括基线比较
fastdeploy/worker/gpu_model_runner.py 修改池化逻辑以支持 reward 模型,添加对池化模型的 max_tokens 处理
fastdeploy/model_executor/pre_and_post_process.py 添加空值检查以处理部分 pooler 输出
fastdeploy/model_executor/models/model_base.py 将 REWARD 类别添加到池化模型支持中
fastdeploy/model_executor/models/ernie_vl_rm.py 实现 reward 模型,添加多任务池化支持,切换到 LAST pooling
fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py 为 reward 模型添加 float32 normalization 和权重映射
fastdeploy/model_executor/models/adapters.py 在 adapter 中添加 reward pooler
fastdeploy/model_executor/model_loader/default_loader_v1.py 将 reward 类型转换映射到 embedding 模型
fastdeploy/model_executor/layers/pooler.py 添加 reward 池化支持,更新 pooler 工厂方法和 StepPooler
fastdeploy/model_executor/layers/pool/metadata.py 更新 device 参数类型从字符串改为 Place 对象
fastdeploy/entrypoints/openai/serving_engine.py 从 chat template kwargs 中移除 add_generation_prompt
fastdeploy/entrypoints/openai/protocol.py 从 EmbeddingChatRequest 中移除 add_generation_prompt 字段
fastdeploy/engine/request.py 为池化请求自动禁用 thinking 模式
fastdeploy/engine/pooling_params.py 将 reward 任务的默认 normalize 设置为 False
fastdeploy/engine/args_utils.py 为池化 runner 禁用 prefix caching
fastdeploy/config.py 添加 reward 任务类型定义和配置映射

Comment on lines +2405 to +2406
for seq_len, prompt_len in zip(seq_lens_cpu, pooling_metadata.prompt_lens):
output = raw_pooler_output[0].data if int(seq_len) == int(prompt_len) else None
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has a critical bug. The original code iterated over both raw_pooler_output and other variables in the zip, extracting .data from each raw_output element. The modified code only accesses raw_pooler_output[0].data for all iterations, meaning all pooler outputs will incorrectly reference the same first element.

Additionally, if raw_pooler_output is a list of tensors (which is what the head returns), then raw_pooler_output[0] would be a tensor, and .data is not a valid attribute on PaddlePaddle tensors.

建议恢复原来的迭代逻辑:

for raw_output, seq_len, prompt_len in zip(raw_pooler_output, seq_lens_cpu, pooling_metadata.prompt_lens):
    output = raw_output if int(seq_len) == int(prompt_len) else None
    pooler_output.append(output)
Suggested change
for seq_len, prompt_len in zip(seq_lens_cpu, pooling_metadata.prompt_lens):
output = raw_pooler_output[0].data if int(seq_len) == int(prompt_len) else None
for raw_output, seq_len, prompt_len in zip(raw_pooler_output, seq_lens_cpu, pooling_metadata.prompt_lens):
output = raw_output if int(seq_len) == int(prompt_len) else None

Copilot uses AI. Check for mistakes.
num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens)
cumsum = paddle.zeros([n_seq + 1], dtype="int64")
num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens, dtype="int64", place=paddle.CPUPlace())
cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace())
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.zeros does not accept a device parameter. The correct parameter name is place. This will cause a runtime error.

应该改为:

cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace())
Suggested change
cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace())
cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace())

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens)
cumsum = paddle.zeros([n_seq + 1], dtype="int64")
num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens, dtype="int64", place=paddle.CPUPlace())
cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace())
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.to_tensor does not accept a device parameter. The correct parameter name is place.

应该改为:

num_scheduled_tokens = paddle.to_tensor(num_scheduled_tokens, dtype="int64", place=paddle.CPUPlace())
Suggested change
cumsum = paddle.zeros([n_seq + 1], dtype="int64", device=paddle.CPUPlace())
cumsum = paddle.zeros([n_seq + 1], dtype="int64", place=paddle.CPUPlace())

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +157 to +175
@pytest.fixture
def consistent_payload():
"""
Returns a fixed payload for reward model consistency testing.
Reward models evaluate user-assistant conversation pairs.
"""
return {
"model": "default",
"messages": [
{"role": "user", "content": [{"type": "text", "text": "北京天安门在哪里?"}]},
{
"role": "assistant",
"content": [{"type": "text", "text": "北京天安门在中国北京故宫的前面。"}],
},
],
"user": "test-user-123",
}


Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture consistent_payload is defined but never used in any test. If it's intended for future use, consider adding a comment explaining its purpose. Otherwise, it should be removed to reduce code clutter.

如果不需要该 fixture,建议删除它以简化代码。

Suggested change
@pytest.fixture
def consistent_payload():
"""
Returns a fixed payload for reward model consistency testing.
Reward models evaluate user-assistant conversation pairs.
"""
return {
"model": "default",
"messages": [
{"role": "user", "content": [{"type": "text", "text": "北京天安门在哪里?"}]},
{
"role": "assistant",
"content": [{"type": "text", "text": "北京天安门在中国北京故宫的前面。"}],
},
],
"user": "test-user-123",
}
# (lines 157–174 removed; no replacement needed)

Copilot uses AI. Check for mistakes.

print(f"\n=== Sending request to {reward_api_url} ===")

# 发送HTTP请求
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该行注释是中文,但代码库中其他注释使用英文。建议统一使用英文注释以保持代码风格一致。

建议改为:# Send HTTP request

Suggested change
# 发送HTTP请求
# Send HTTP request

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for pid in output.splitlines():
os.kill(int(pid), signal.SIGKILL)
print(f"Killed process on port {port}, pid={pid}")
except subprocess.CalledProcessError:
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except subprocess.CalledProcessError:
except subprocess.CalledProcessError:
# No process is listening on the port, so nothing to kill.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants