Heterogeneously run the LLaMA model on both the QNN and XNNPACK backends. #13629

yujiaoliang · 2025-08-25T08:02:56Z

yujiaoliang
Aug 25, 2025

I’m planning to deploy the quantized LLaMA 3.2-3B model on QNN and run some of its linear layers on XNNPACK. Would this be possible?
Is this kind of setup supported at the moment?

Answered by cccclai

Aug 28, 2025

Which executor are you using? Some of them might not linked with qnn backend or the other way around. This executor runner should link the qnn backend https://github.com/pytorch/executorch/blob/main/examples/qualcomm/executor_runner/qnn_executor_runner.cpp not quite sure if it's linked with xnnpack backend

View full answer

GregoryComer · 2025-08-25T21:39:03Z

GregoryComer
Aug 25, 2025
Collaborator

@yujiaoliang For QNN, specifically, you can instruct the QNN partitioner to skip specific node IDs or operators, which will allow them to fall back to XNNPACK. See the QNN partitioner args here - https://www.internalfb.com/code/fbsource/[3369a2d3a668]/fbcode/executorch/backends/qualcomm/partition/qnn_partitioner.py?lines=135. You can then pass both the QnnPartitioner and XnnpackPartitioner to_edge_transform_and_lower. The second partitioner will act as a fallback.

to_edge_transform_and_lower(
    ep,
    partitioner=[qnn_partitioner, xnnpack_partitioner]
)

You can also provide a custom partitioner for advanced use cases, but it will require a bit of coding. There is an example in https://docs.pytorch.org/executorch/main/compiler-delegate-and-partitioner.html#common-questions under "5. Can we delegate to multiple backends?".

7 replies

GregoryComer Aug 26, 2025
Collaborator

@abhinaykukkadapu Do you know if the skip ops will respect quantized variants?

abhinaykukkadapu Aug 27, 2025
Collaborator

I believe so according to: https://github.com/pytorch/executorch/blob/main/examples/qualcomm/utils.py#L452, @cccclai correct if i'm wrong.

yujiaoliang Aug 28, 2025
Author

Thank you very much for the help you have provided so far.
I used the qnnpartitioner and xnnpartitioner to compile a simple model and generated a PTE file, but it seems that this file cannot be executed directly. It looks like execution requires two corresponding executors, is that correct? How should such a binary be compiled, or are there additional steps involved in the heterogeneous execution flow? The PTE file and the converted SVG image are attached. Could you please confirm whether this PTE file is valid?
Thank you again for your support.

cccclai Aug 28, 2025
Collaborator

Which executor are you using? Some of them might not linked with qnn backend or the other way around. This executor runner should link the qnn backend https://github.com/pytorch/executorch/blob/main/examples/qualcomm/executor_runner/qnn_executor_runner.cpp not quite sure if it's linked with xnnpack backend

Answer selected by yujiaoliang

yujiaoliang Sep 3, 2025
Author

Thank you very much! With your help, I was able to complete heterogeneous execution of the simple model across XNNPACK and QNN backends. Also, I’m planning to explore heterogeneous execution for quantized models and LLaMA in the next steps. I’d be happy to contribute this example to the repo if it might be helpful~

cccclai Sep 3, 2025
Collaborator

If there is an example to run llama and have different parts running on different backends, yeah that will be great!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heterogeneously run the LLaMA model on both the QNN and XNNPACK backends. #13629

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Heterogeneously run the LLaMA model on both the QNN and XNNPACK backends. #13629

Uh oh!

yujiaoliang Aug 25, 2025

Replies: 1 comment · 7 replies

Uh oh!

Uh oh!

GregoryComer Aug 25, 2025 Collaborator

Uh oh!

GregoryComer Aug 26, 2025 Collaborator

Uh oh!

abhinaykukkadapu Aug 27, 2025 Collaborator

Uh oh!

yujiaoliang Aug 28, 2025 Author

Uh oh!

cccclai Aug 28, 2025 Collaborator

Uh oh!

yujiaoliang Sep 3, 2025 Author

Uh oh!

cccclai Sep 3, 2025 Collaborator

yujiaoliang
Aug 25, 2025

Replies: 1 comment 7 replies

GregoryComer
Aug 25, 2025
Collaborator

GregoryComer Aug 26, 2025
Collaborator

abhinaykukkadapu Aug 27, 2025
Collaborator

yujiaoliang Aug 28, 2025
Author

cccclai Aug 28, 2025
Collaborator

yujiaoliang Sep 3, 2025
Author

cccclai Sep 3, 2025
Collaborator