Add more models as part of GA models #12340

mergennachin · 2025-07-10T15:48:04Z

Summary

Added 8 new representative models to ExecutorTorch examples:

EfficientNet-B4: Image classification with CNN architecture
DETR-ResNet50: Object detection using transformer decoder
SegFormer-ADE: Semantic segmentation transformer
Swin2SR: Super-resolution with Swin transformer
ALBERT: Lightweight BERT for NLP tasks
TrOCR: Optical character recognition transformer
Wav2Vec2: Cross-lingual speech representation learning

All models include XNNPACK backend support with appropriate quantization configurations and full CI integration.

Test plan:

Validate model export and execution with portable backend
Test XNNPACK delegation and quantization (with appropriate exclusions)
Integrate into CI workflows for automated testing
Verify all models perform their intended tasks accurately

pytorch-bot · 2025-07-10T15:48:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12340

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Cancelled Job, 7 Unrelated Failures

As of commit 05a4134 with merge base f82c2f0 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for examples/models/wav2vec2/model.py:
trunk / test-llama-runner-mac (fp32, coreml) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-linux-aarch64 (wav2vec2, portable, linux.arm64.2xlarge) / linux-job (gh)
AttributeError: type object 'Wav2Vec2Model' has no attribute 'from_pretrained'
trunk / test-models-linux-aarch64 (wav2vec2, xnnpack-quantization-delegation, linux.arm64.2xlarge) / linux-job (gh)
AttributeError: type object 'Wav2Vec2Model' has no attribute 'from_pretrained'

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / test-models-linux-aarch64 (detr_resnet50, portable, linux.arm64.2xlarge) / linux-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success
pull / unittest / macos / macos-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success
trunk / test-llama-torchao-lowbit / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 134
trunk / unittest-release / linux / linux-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mergennachin · 2025-07-10T22:36:47Z

Tests failures don't look related

swin2sr_2x and trocr_handwritten are not exporting yet. Disabled for now. #12365

Need to export with strict=False so that these models are exportable.

However, when enabling strict=False by default, #12368, there are a few failures, namely ic3,ic4 and llama. #12370

kimishpatel · 2025-07-11T00:06:58Z

how were these specific models were selected?

kimishpatel · 2025-07-11T00:08:49Z

examples/models/bilstm/model.py

+from ..model_base import EagerModelBase
+
+
+class BidirectionalLSTM(nn.Module):


this just seems like a sample model to contain lstm? Should we instead include something else? @jackzhxng was looking into kokoro that also has lstm. although export is a problem there

Yeah, I will delete this one. This one's not really useful

kimishpatel · 2025-07-11T00:10:24Z

The other question I would have is, what would be the goal of adding more models? Do we just want to claim enablement or at the least provide good perf via xnnpack lowering? Because this might end up expanding the work that needs to happen, not 100% sure, but just calling it out.

Do agree on covering more vision/object detection

mergennachin · 2025-07-11T12:38:09Z

Thanks @kimishpatel

how were these specific models were selected?

So, i followed this process. I looked at https://ai-benchmark.com/ranking.html and https://mlcommons.org/benchmarks/inference-mobile/ models, and see if we can enable any delta that we didn't have. However, it was difficult to find the "official" pytorch and/or huggingface implementation. Most of them are TFLite or ONNX implementations. For some, we were able to find pytorch, like EfficientNet-B4, Albert

So, I swapped with similar models that already has:

Object detection: I swapped Yolov4, SSD-MobileNetV2 with DETR-ResNet50
Semantic Segmentation: I swapped DeepLabV3+ with SegFormer-ADE (even though we already have deeplabv3)
Image super resolution: Swapped ESRGAN with Swin2SR
OCR: Picked TrOCR

The other question I would have is, what would be the goal of adding more models?

Mainly inspired by the the ai-benchmark and mlcommons. The goal would be covering specific tasks that we don't have coverage. We won't be adding many models but only representative models within each tasks. There are still a few missing important ones like depth estimation and video super resolution

For more and more models in specific tasks, we can expand to leverage optimum-executorch instead.

Added 8 new representative models to ExecutorTorch examples: EfficientNet-B4: Image classification with CNN architecture DETR-ResNet50: Object detection using transformer decoder SegFormer-ADE: Semantic segmentation transformer Swin2SR: Super-resolution with Swin transformer ALBERT: Lightweight BERT for NLP tasks TrOCR: Optical character recognition transformer Wav2Vec2: Cross-lingual speech representation learning All models include XNNPACK backend support with appropriate quantization configurations and full CI integration. Test plan: Validate model export and execution with portable backend Test XNNPACK delegation and quantization (with appropriate exclusions) Integrate into CI workflows for automated testing Verify all models perform their intended tasks accurately

cccclai · 2025-07-11T16:51:21Z

Any specific reason that ai-benchmark and mlcommons are picked as a reference for the model list? Just curious because there are lists from other source

guangy10 · 2025-07-11T17:36:23Z

@mergennachin thanks for the clarification. And I'm happy to know how we can make a joint effort regarding get more task/model coverage, something happening in optimum-et. Some models like EfficientNet, Albert Swin, were added there already.

I looked at https://ai-benchmark.com/ranking.html and https://mlcommons.org/benchmarks/inference-mobile/ models, and see if we can enable any delta that we didn't have. However, it was difficult to find the "official" pytorch and/or huggingface implementation. Most of them are TFLite or ONNX implementations.

Looks like the main motivation is to enable ai-benchmark to use ET generated models? I recall in the meeting with them, they mentioned that we are deploying exactly same model across all devices and the expense of switching to different variant of models are high. Given it, would adding similar but not exact same models help with adoption by ai-benchmark?

There are still a few missing important ones like depth estimation and video super resolution

I recall there are depth estimation models enabled in HF transformers (e.g. DepthAnything). Not sure about video super resolution (HF doesn't have a task classification for image/video super resolution).

The goal would be covering specific tasks that we don't have coverage. We won't be adding many models but only representative models within each tasks.

What are other tasks we would like to cover for GA? Would it be a good idea if we browse top 1-2 popular models based on the task classification on hugging face hub?

guangy10 · 2025-07-11T17:43:14Z

examples/xnnpack/__init__.py

+    "efficientnet_b4": XNNPACKOptions(QuantType.STATIC_PER_CHANNEL, True),
+    "detr_resnet50": XNNPACKOptions(QuantType.STATIC_PER_CHANNEL, True),
+    "segformer_ade": XNNPACKOptions(QuantType.STATIC_PER_CHANNEL, True),
+    "swin2sr_2x": XNNPACKOptions(QuantType.STATIC_PER_CHANNEL, True),
+    "albert": XNNPACKOptions(QuantType.DYNAMIC_PER_CHANNEL, True),
+    "trocr_handwritten": XNNPACKOptions(QuantType.STATIC_PER_CHANNEL, True),
+    "wav2vec2": XNNPACKOptions(QuantType.DYNAMIC_PER_CHANNEL, True),


@mergennachin Are those models calibrated after statically quantized? And I'm curious to know how you decide how to quantize the model (static vs dynamic) and how the quality of the quantized model is validated?

mergennachin · 2025-07-11T21:51:59Z

What are other tasks we would like to cover for GA? Would it be a good idea if we browse top 1-2 popular models based on the task classification on hugging face hub?

@guangy10 here's what I compiled: #12378

digantdesai · 2025-07-14T10:15:14Z

Any specific reason that ai-benchmark and mlcommons are picked as a reference for the model list? Just curious because there are lists from other source

Popularity? Please include other popular lists here so we can try them as well?

mergennachin requested review from digantdesai, mcr229, lucylq and jackzhxng as code owners July 10, 2025 15:48

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 10, 2025

mergennachin force-pushed the add_new_models branch from 145ff6c to 3e6d709 Compare July 10, 2025 15:58

mergennachin added the ciflow/trunk label Jul 10, 2025

mergennachin requested review from cccclai and kimishpatel July 10, 2025 15:58

mergennachin added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Jul 10, 2025

mergennachin force-pushed the add_new_models branch 4 times, most recently from ea28cc8 to 9eccb4d Compare July 10, 2025 17:30

mergennachin marked this pull request as draft July 10, 2025 18:36

mergennachin force-pushed the add_new_models branch 7 times, most recently from 8265cc3 to 362b08b Compare July 10, 2025 21:08

mergennachin marked this pull request as ready for review July 10, 2025 21:08

mergennachin force-pushed the add_new_models branch from 362b08b to e2c3810 Compare July 10, 2025 21:14

mergennachin requested review from guangy10, larryliu0820 and JacobSzwejbka July 10, 2025 22:40

kimishpatel reviewed Jul 11, 2025

View reviewed changes

mergennachin force-pushed the add_new_models branch 2 times, most recently from faf2916 to 4ba5a0e Compare July 11, 2025 12:57

mergennachin force-pushed the add_new_models branch from 4ba5a0e to 05a4134 Compare July 11, 2025 12:57

mergennachin mentioned this pull request Jul 11, 2025

Add new GA models (next set) #12398

Draft

guangy10 reviewed Jul 11, 2025

View reviewed changes

mergennachin mentioned this pull request Jul 11, 2025

Add more representative models for diverse as part of GA++ models #12378

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add more models as part of GA models #12340

Add more models as part of GA models #12340

mergennachin commented Jul 10, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 10, 2025 •

edited

Loading

Uh oh!

mergennachin commented Jul 10, 2025

Uh oh!

kimishpatel commented Jul 11, 2025

Uh oh!

kimishpatel Jul 11, 2025

Uh oh!

mergennachin Jul 11, 2025

Uh oh!

kimishpatel commented Jul 11, 2025

Uh oh!

mergennachin commented Jul 11, 2025

Uh oh!

cccclai commented Jul 11, 2025

Uh oh!

guangy10 commented Jul 11, 2025

Uh oh!

guangy10 Jul 11, 2025

Uh oh!

mergennachin commented Jul 11, 2025

Uh oh!

digantdesai commented Jul 14, 2025

Uh oh!

Uh oh!

		from ..model_base import EagerModelBase


		class BidirectionalLSTM(nn.Module):

Add more models as part of GA models #12340

Are you sure you want to change the base?

Add more models as part of GA models #12340

Conversation

mergennachin commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12340

❌ 4 New Failures, 1 Cancelled Job, 7 Unrelated Failures

Uh oh!

mergennachin commented Jul 10, 2025

Uh oh!

kimishpatel commented Jul 11, 2025

Uh oh!

kimishpatel Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel commented Jul 11, 2025

Uh oh!

mergennachin commented Jul 11, 2025

Uh oh!

cccclai commented Jul 11, 2025

Uh oh!

guangy10 commented Jul 11, 2025

Uh oh!

guangy10 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin commented Jul 11, 2025

Uh oh!

digantdesai commented Jul 14, 2025

Uh oh!

Uh oh!

mergennachin commented Jul 10, 2025 •

edited

Loading

pytorch-bot bot commented Jul 10, 2025 •

edited

Loading