[Draft] support mooncake barebone connectorV1 #1011

DreamerLeader · 2025-05-29T10:21:03Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

方案与遗留问题

当前AscendTransport中无法感知到PD所在的NPU卡号，在容器中使用ASCEND_RT_VISIBLE_DEVICES指定vllm运行的卡之后，无论vllm运行在几号npu上，使用aclrtGetDevice(&deviceId)获取当前运行设备，设备号一直显示为0，使用aclrtSetDevice重新设置运行设备，发现也只能设置为0，现象显示deviceId概念是逻辑概念而非硬件物理概念，不确定是否符合实际情况（可以考虑对外接口新增传入环境变量参数ASCEND_RT_VISIBLE_DEVICES来区分卡，待评估）
问题解决方案：
提供四种解决方案，待评审：
1、从ranktable里面去读取设备id, 传环境变量的id,在init时新增一个参数

2、传入的hostname（ip：port+NPU_rank_id）,通过传入的port减去基准port获取到rankid

3、修改hostname的格式，ip:port:NPU_rank_id,通过解析获取rank_id

4、可以研究下有没有现有接口，可以在进程中查询到device_id或者rank_id

背景:V1 barebone和transfer_enine会共同维护一个ranktable，可以通过环境变量DISAGGREGATED_RPEFILL_RANK_TABLE_PATH获取到rank_table路径，device_ip通过传入的rank_id，查询ranktable获取

遗留问题：
1、多卡的情形下，怎么去写mooncake.json,当前是一张卡上一个进程，一个进程启一个transfor_engine,这样需要不同的mooncke.json
2、当前收发kv_cache会收发全量的kv_cache，需要给出用户想要发送指定kv_cache传递命令的方式

运行指南

1、启动metadata_server

进到Mooncake的文件中

cd /Mooncake/mooncake-transfer-engine/example/http-metadata-server

启动metadata_server，ip和port写成自己的，保证和mooncake.json中的配置一致

go run . --addr=0.0.0.0:8080

2、拉起producer和consumer

前置准备：

环境变量DISAGGREGATED_RPEFILL_RANK_TABLE_PATH要配置成自己的rank_table.json
如下配置

{
    "version": "1.2",
    "server_count": "1",
    "prefill_device_list": [
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "0",
            "device_ip": "124.0.4.31",
            "super_device_id": "46137344",
            "cluster_id": "0"
        },
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "1",
            "device_ip": "124.0.4.32",
            "super_device_id": "46137344",
            "cluster_id": "1"
        },
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "2",
            "device_ip": "124.0.4.33",
            "super_device_id": "46137344",
            "cluster_id": "2"
        },{
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "3",
            "device_ip": "124.0.4.34",
            "super_device_id": "46137344",
            "cluster_id": "3"
        }

    ],
    "decode_device_list": [
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "4",
            "device_ip": "124.0.4.35",
            "super_device_id": "16777216",
            "cluster_id": "4"
        },
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "5",
            "device_ip": "124.0.4.36",
            "super_device_id": "46137344",
            "cluster_id": "5"
        },
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "6",
            "device_ip": "124.0.4.37",
            "super_device_id": "46137344",
            "cluster_id": "6"
        },
        {
            "super_pod_id": "2",
            "server_id": "localhost",
            "device_id": "7",
            "device_ip": "124.0.4.38",
            "super_device_id": "46137344",
            "cluster_id": "7"
        }
    ],
    "status": "completed"
}

"super_pod_id": 超节点的id，这个不重要,
"server_id": 配置成自己本机的ip,
"device_id": 当前设备的卡号,
"device_ip": 当前卡的ip地址,
"super_device_id": 超节点设备的id，没用到,
"cluster_id": 从上往下按顺序编号

环境变量MOONCAKE_CONFIG_PATH配置为mooncake.json

{
  "prefill_url": "localhost:15272",
  "decode_url": "localhost:14012",	
  "metadata_server": "http://localhost:8080/metadata",
  "metadata_backend": "http",
  "protocol": "ascend",
  "device_name": ""
}

"prefill_url": 配置prefill所在ip和prot,
"decode_url": 配置decode所在ip和prot",
"metadata_server": 要求与第一步配置的ip和端口一致，
"metadata_backend": 选http,
"protocol": 配置hccl的通信,
"device_name": ""

启动

在vllm_ascend有新提交的启动脚本：

cd /vllm-ascend/examples/disaggregate_prefill_v1/

拉起producer

bash run_producer.sh

拉起consumer

bash run_consumer.sh

3、启动proxy_server

cd /vllm-ascend/examples/disaggregate_prefill_v1/
python proxy_server.py

proxy_server.py文件中的有注释Configure the IP and port to your own settings和Set the host configuration to your own IP的地方将ip和port全都修改成自己实际的ip，8000端口时proxy_server启动的端口，8100是配置在producer的shell脚本里的端口，8200是配置在consumer的shell脚本里的端口

4、推理任务下发

推理中的ip配置成自己的
其中model变量配置成自己模型的路径，同时保证和shell脚本里面的一致

curl -s http://0.0.0.0:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "/home/data/weight/Qwen2.5-7B",
"prompt": "Given the accelerating impacts of climate change—including rising sea levels, increasing frequency of extreme weather events, loss of biodiversity, and adverse effects on agriculture and human health—there is an urgent need for a robust, globally coordinated response. However, international efforts are complicated by a range of factors: economic disparities between high-income and low-income countries, differing levels of industrialization, varying access to clean energy technologies, and divergent political systems that influence climate policy implementation. In this context, how can global agreements like the Paris Accord be redesigned or strengthened to not only encourage but effectively enforce emission reduction targets? Furthermore, what mechanisms can be introduced to promote fair and transparent technology transfer, provide adequate financial support for climate adaptation in vulnerable regions, and hold nations accountable without exacerbating existing geopolitical tensions or disproportionately burdening those with historically lower emissions?",
"max_tokens": 256
}'

5、Mooncake TE打通适配总结

Transfer_engine调用initializeExt接口进行初始化操作，传参新增参数int local_rank，用于传递进行运行的物理NPU卡号，后续若不想新增入参，也设计多种方案复用原有参数，来传递物理NPU卡号；

int TransferEnginePy::initializeExt(const char *local_hostname,
                                    const char *metadata_server,
                                    const char *protocol,
                                    const char *device_name,
                                    const char *metadata_type, int local_rank)

当前mooncakebarebone+v1在registerKVCache接口中注册内存，实现上将所有的Kvcache块调用TE的registerLocalMemory，注册全量的Kvcache到metadata中；
发起发送请求是通过调用transferSyncRead接口，通过target_hostname访问目标segment desc，每次调用发送一块注册的Kvcache，当前通过全局变量累加的方式，记录已传出的内存数组下标，每个mooncakebarebone+v1的一次request会完成全量Kvcache的发送。
4、

int cnt = 1;
int TransferEnginePy::transferSyncRead(const char *target_hostname,
                                       uintptr_t buffer,
                                       uintptr_t peer_buffer_address,
                                       size_t length) {
    pybind11::gil_scoped_release release;
    Transport::SegmentHandle handle;
    if (handle_map_.count(target_hostname)) {
        handle = handle_map_[target_hostname];
    } else {
        handle = engine_->openSegment(target_hostname);
        if (handle == (Transport::SegmentHandle)-1) return -1;
        handle_map_[target_hostname] = handle;
    }

    auto segment_desc = engine_->getMetadata()->getSegmentDescByID(handle);
    if (!segment_desc) {
        LOG(ERROR) << "Unable to get target segment ID, please recheck";
        exit(EXIT_FAILURE);
    }
    uint64_t remote_base = (uint64_t)segment_desc->buffers[cnt].addr;
    cnt++;

github-actions · 2025-06-03T09:40:08Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

v1+barebone流程实现

2769b54

github-actions bot added module:ops module:core labels May 29, 2025

DreamerLeader changed the title ~~v1+barebone流程实现~~ [Draft] support mooncake barebone connectorV1 May 29, 2025

DreamerLeader marked this pull request as draft May 29, 2025 10:37

github-actions bot added the merge-conflicts label Jun 3, 2025

DreamerLeader added 4 commits June 6, 2025 12:26

MooncakeConnector Barebone 优化

9c34fa8

MooncakeConnector Barebone 优化

b7b5aee

barebone代码优化

d377c9e

Mooncake barebone代码优化

a5aabd8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] support mooncake barebone connectorV1 #1011

[Draft] support mooncake barebone connectorV1 #1011

Uh oh!

DreamerLeader commented May 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

Uh oh!

[Draft] support mooncake barebone connectorV1 #1011

Are you sure you want to change the base?

[Draft] support mooncake barebone connectorV1 #1011

Uh oh!

Conversation

DreamerLeader commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

方案与遗留问题

运行指南

1、启动metadata_server

2、拉起producer和consumer

前置准备：

启动

3、启动proxy_server

4、推理任务下发

5、Mooncake TE打通适配总结

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

Uh oh!

DreamerLeader commented May 29, 2025 •

edited

Loading