Implemented range setting in QNN llama flow #12377

rohansjoshi · 2025-07-10T23:02:26Z

Summary:
llama.py now has the --range_setting flag, for which there are the options mse_weight_only and mse_with_act_loss. There is also an eval script for computing perplexity called eval_llama_qnn.py (for faster eval, try seq length 1024). This script also has a flag --quant_linear_only to only quantize linear/conv nodes, to run faster experiments.

Commands:

python examples/qualcomm/oss_scripts/llama/llama.py --checkpoint {MODEL_DIR}/consolidated.00.pth --params {MODEL_DIR}/params.json --tokenizer_path {MODEL_DIR}/tokenizer.model --max_seq_length 128 --ptq 16a4w --range_setting mse_with_act_loss

python examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py --checkpoint {MODEL_DIR}/consolidated.00.pth --params {MODEL_DIR}/params.json --tokenizer_path {MODEL_DIR}/tokenizer.model --max_seq_length 128 --ptq 16a4w --range_setting mse_with_act_loss

Rollback Plan:

Differential Revision: D78127727

pytorch-bot · 2025-07-10T23:02:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12377

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit d55c96d with merge base dd4488d ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for examples/qualcomm/oss_scripts/llama/range_setting_pt2e.py:
pull / test-eval_llama-mmlu-linux / linux-job (gh)
RuntimeError: Command docker exec -t 3c6ba62f22d1e2bcc176b4fe922fa25f764b415d084ee651e8360965ade52f20 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-07-10T23:02:35Z

This pull request was exported from Phabricator. Differential Revision: D78127727

github-actions · 2025-07-10T23:03:05Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: `llama.py` now has the `--range_setting` flag, for which there are the options `mse_weight_only` and `mse_with_act_loss`. There is also an eval script for computing perplexity called `eval_llama_qnn.py` (for faster eval, try seq length 1024). This script also has a flag --quant_linear_only to only quantize linear/conv nodes, to run faster experiments. Commands: ```python examples/qualcomm/oss_scripts/llama/llama.py --checkpoint {MODEL_DIR}/consolidated.00.pth --params {MODEL_DIR}/params.json --tokenizer_path {MODEL_DIR}/tokenizer.model --max_seq_length 128 --ptq 16a4w --range_setting mse_with_act_loss``` ```python examples/qualcomm/oss_scripts/llama/eval_llama_qnn.py --checkpoint {MODEL_DIR}/consolidated.00.pth --params {MODEL_DIR}/params.json --tokenizer_path {MODEL_DIR}/tokenizer.model --max_seq_length 128 --ptq 16a4w --range_setting mse_with_act_loss``` Rollback Plan: Differential Revision: D78127727

facebook-github-bot · 2025-07-14T23:29:39Z

This pull request was exported from Phabricator. Differential Revision: D78127727

cccclai

Still reading, will finish reading in a bit

cccclai · 2025-07-15T18:40:56Z

examples/qualcomm/oss_scripts/llama/llama.py

+            model.ar_len = model.max_seq_len
+            tokens, atten_mask = model.get_example_inputs(use_kv_cache=False)
+            atten_mask.to(torch.float)
+            print(atten_mask.shape)


Removing debugging line

cccclai · 2025-07-15T18:41:43Z

examples/qualcomm/oss_scripts/llama/llama.py

-                        kv_quant_attrs=kv_quant_attrs,
-                    ),
-                )
+                # custom_annotations = custom_annotations + (


Actually I need to have a separate PR for this.

rohansjoshi requested a review from cccclai as a code owner July 10, 2025 23:02

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 10, 2025

facebook-github-bot added the fb-exported label Jul 10, 2025

rohansjoshi force-pushed the export-D78127727 branch from a457091 to d55c96d Compare July 14, 2025 23:29

cccclai reviewed Jul 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implemented range setting in QNN llama flow #12377

Implemented range setting in QNN llama flow #12377

Uh oh!

rohansjoshi commented Jul 10, 2025

Uh oh!

pytorch-bot bot commented Jul 10, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

facebook-github-bot commented Jul 14, 2025

Uh oh!

cccclai left a comment

Uh oh!

cccclai Jul 15, 2025

Uh oh!

cccclai Jul 15, 2025

Uh oh!

Uh oh!

Implemented range setting in QNN llama flow #12377

Are you sure you want to change the base?

Implemented range setting in QNN llama flow #12377

Uh oh!

Conversation

rohansjoshi commented Jul 10, 2025

Uh oh!

pytorch-bot bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12377

❌ 2 New Failures

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Jul 14, 2025

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

cccclai Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

cccclai Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 10, 2025 •

edited

Loading

This PR needs a `release notes:` label