[Kernel] [Helion] Helion kernel wrapper #32964

gmagogsfm · 2026-01-23T19:45:22Z

THIS IS A MANUALLY STACKED PR, PLEASE ONLY REVIEW TOP COMMIT, lower commits are being reviewed separately in an earlier PR.

This PR adds two basic cases to help Helion kernels with compilation and runtime dispatching.

HelionKernelWrapper would be constructed by vllm.helion.register() in following PRs. It is responsible for adding Helion kernel to registry and partially specify Helion kernel according to GPU platform and model config. As a result of specification, it would produce a ConfiguredHelionKernel, which is a callable registered as a PyTorch custom op.
The registered custom op contains batch-size-based runtime dispatching as well as actual Helion compilation logic. Upon invocation with dummy/real input data, it would compile and call helion kernels optimized for most fitting batch size. This dispatch decision can then be baked in via cudagraph capturing to completely eliminate overhead.

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

gemini-code-assist

Code Review

The pull request introduces the Helion kernel wrapper, configuration management, and utility functions. The new components, ConfigKey, ConfigSet, ConfigManager, ConfiguredHelionKernel, and HelionKernelWrapper, are well-structured and include comprehensive unit tests covering various scenarios, including valid and invalid inputs, default values, and error handling. The design for dynamic batch-size-based kernel dispatching and compilation caching is clear. The __all__ declarations in the __init__.py files are correctly populated, and the ImportError checks for the helion dependency are properly implemented.

vllm/kernels/helion/register.py

gmagogsfm · 2026-01-23T19:50:43Z

@ProExpertProg @zou3519 @xiaohongchen1991 Please take a look, this PR depends on #32740

xiaohongchen1991 · 2026-01-23T22:45:53Z

For line 154-155 inside ConfiguredHelionKernel._get_compiled_kernel,

compiled_kernel = helion.kernel(**kernel_kwargs)(self.wrapper.raw_kernel_func)
self._compiled_kernels[config_hash] = compiled_kernel

this is actually to cache the decorated kernel, i.e., the helion kernel decorated with "best" config found.

I did some experiment before on the CPU overhead by comparing invoking the silu_and_mul kernel at different compilation stages. See the following results.

Kernel decorated_silu_mul bound_silu_mul compiled_silu_mul code_gen_triton_kernel

Latency (ms) 0.050 0.044 0.038 0.033

Mapping to your code, those kernels are from

decorated_silu_mul = helion.kernel(**kernel_kwargs)(silu_mul)
bound_silu_mul = decorated_silu_mul.bind((*args, **kwargs))
compiled_silu_mul = bound_silu_mul.compile_config(config)

I understand those CPU overhead is nothing when graph capture is enabled. But may still be good to optimize it since ConfiguredHelionKernel._get_compiled_kernel is called by ConfiguredHelionKernel.__call__ where we have the inputs to create the real compiled_kernel and cache it directly.

xiaohongchen1991 · 2026-01-23T22:57:08Z

vllm/kernels/helion/register.py

+
+    def __call__(self, *args, **kwargs):
+        """Execute the kernel with dynamic batch_size-based config selection."""
+        # TODO(gmagogsfm): Validate this assumption. If it doesn't hold


Yeah, this assumption will not hold for all kernels. Here is an example from an existing triton kernel used by LoRA feature.

vllm/vllm/lora/ops/triton_ops/lora_expand_op.py

Line 132 in 5206e5e

inputs: torch.Tensor, # shape [num_slices, num_tokens, lora_rank]

.

We need a more generic solution here.

gmagogsfm added 5 commits January 21, 2026 00:03

[Kernel] Add Helion ConfigManager

dbd8995

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

config set validation when loading

71c5bc3

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

Add missing __init__ for helion module

9eb4daf

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

Move vllm/compilation/helion to vllm/kernels/helion

78e44b3

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

Add HelionKernelWrapper & ConfiguredHelionKernel

584210d

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

gmagogsfm requested review from WoosukKwon, mgoin, tlrmchlsmth and yewentao256 as code owners January 23, 2026 19:45

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

vllm/kernels/helion/register.py Show resolved Hide resolved

xiaohongchen1991 reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Kernel] [Helion] Helion kernel wrapper #32964

[Kernel] [Helion] Helion kernel wrapper #32964

gmagogsfm commented Jan 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gmagogsfm commented Jan 23, 2026

Uh oh!

xiaohongchen1991 commented Jan 23, 2026

Uh oh!

xiaohongchen1991 Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Kernel] [Helion] Helion kernel wrapper #32964

Are you sure you want to change the base?

[Kernel] [Helion] Helion kernel wrapper #32964

Conversation

gmagogsfm commented Jan 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gmagogsfm commented Jan 23, 2026

Uh oh!

xiaohongchen1991 commented Jan 23, 2026

Uh oh!

xiaohongchen1991 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gmagogsfm commented Jan 23, 2026 •

edited by github-actions bot

Loading