You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-05-12-hardware-plugin.md
+5-6Lines changed: 5 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,11 @@
1
1
---
2
2
layout: post
3
-
title: "Introducing vLLM Hardware Plugin and Best Practice with Ascend NPU"
4
-
author: "vLLM Ascend Team"
3
+
title: "Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU"
4
+
author: "The Ascend Team on vLLM"
5
5
image: /assets/logos/vllm-logo-only-light.png
6
6
---
7
7
8
-
Since December 2024, through the joint efforts of the vLLM community and the vLLM Ascend team, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms. The RFC has now taken initial shape.
9
-
This proposal enables hardware integration into vLLM in a decoupled way, allowing for quick and modular support of various hardware platforms.
8
+
Since December 2024, through the joint efforts of the vLLM community and the vLLM Ascend team, we have completed the [Hardware Pluggable RFC]((https://github.com/vllm-project/vllm/issues/11162)). This proposal allows hardware integration into vLLM in a decoupled manner, enabling rapid and modular support for different hardware platforms.
10
9
11
10
---
12
11
@@ -18,7 +17,7 @@ Currently, vLLM already supports multiple backends. However, as the number of vL
18
17
-**High Maintenance Costs**: The cost of maintaining backends is high, not only for the backend developers but also for the vLLM community. The scarcity of community contributor resources makes efficiently adding new features difficult when backend maintainers are not present.
19
18
-**Lack of Extensibility**: While vLLM follows a well-structured layered design by implementing backends through `Executor`, `Worker`, `Runner`, and `Attention`, supporting new hardware often requires invasive modifications or patching rather than dynamic registration. This makes adding new backends cumbersome.
20
19
21
-
Recognizing the need for a flexible and modular approach to integrating hardware backends, we identified hardware pluginization as a feasible solution:
20
+
Recognizing the need for a flexible and modular approach to integrating hardware backends, we proposed hardware plugins as a feasible solution:
22
21
23
22
-**Decoupled Codebase**: The hardware backend plugin code remains independent, making the vLLM core code cleaner.
24
23
-**Reduced Maintenance Burden**: vLLM developers can focus on generic features without being overwhelmed by the differences caused by backend-specific implementations.
@@ -112,4 +111,4 @@ Moving forward, we will continue collaborating with developers in the vLLM commu
112
111
2. Expanding plugin support for more modules and features, such as scheduler and custom operators.
113
112
3. Better user experience and higher performance.
114
113
115
-
We encourage everyone to try out this new feature! If you have any questions, join the [vLLM Slack](https://inviter.co/vllm-slack) and participate in the **#sig-extensible-hardware** channel for discussions. 🚀
114
+
We encourage everyone to try out this new feature! If you have any questions, join the [vLLM Slack](https://slack.vllm.ai) and participate in the **#sig-extensible-hardware** channel for discussions. 🚀
0 commit comments