[Inductor][float8] Support qlinear for float8 in inductor #2565

shiyang-weng · 2025-07-17T02:59:10Z

For float8_e4m3fn, support

register_qlinear_weight_prepack
_register_qlinear_unary_fusion
_register_qlinear_binary_fusion
quant_lift_up

on inductor.

For FP8, there are following issues

q/dq switch to use quantize_affine_float8/dequantize_affine_float8
The q/dq API change. The fp8 q/dq requires type(scale) is tensor.
pt2e not support float8.

Based on these issues,

Need to handle fp8 q/dq pattern separately.
Handle scale separately.
We implement the function(fp8_convert_), which can add q/dq before the linear in the model. We add the function to test/quantization/pt2e/test_x86inductor_fusion.py

…uctor

Add fp8 dequant promotion

pytorch-bot · 2025-07-17T02:59:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2565

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/quantization/pt2e/test_x86inductor_fusion.py

shiyang-weng · 2025-07-21T06:23:24Z

torchao/quantization/pt2e/inductor_passes/x86.py

+            return (
+                len(node.all_input_nodes) == 2
+                and node.all_input_nodes[1].target == torch.tensor
+            )


will add return False

Xia-Weiwen

Thanks for the PR!

Xia-Weiwen · 2025-07-21T15:15:09Z

test/quantization/pt2e/test_x86inductor_fusion.py

-        for bias in [True, False]:
-            self._qlinear_test_helper((torch.randn((2, 4)),), bias=bias)
+        for is_fp8 in [True, False]:
+            for bias in [True, False]:
+                self._qlinear_test_helper(
+                    (torch.randn((2, 4)),), bias=bias, is_fp8=is_fp8
+                )


It would be better to fp8 stuff in separate tests, i.e., keeping test_qlinear_cpu and adding test_fp8_qlinear_cpu. Same for other tests.

Xia-Weiwen · 2025-07-21T15:16:15Z

test/quantization/pt2e/test_x86inductor_fusion.py

@@ -1804,13 +1940,166 @@ def test_qlinear_add_int8_mixed_bf16(self, use_relu, is_qat, is_dynamic):
            is_dynamic=is_dynamic,
        )

+    def _fp8_qlinear_add_test_helper(


What's the difference between the int8 version and the fp8 version? Can we merge them?

Xia-Weiwen · 2025-07-21T15:16:40Z

test/quantization/pt2e/test_x86inductor_fusion.py

+            lambda x, y: x.add_(y),
+            lambda x, y: y.add_(x),
+        ]
+        is_fp8 = True


If we are defining a dedicated helper for fp8, is this still needed?

Xia-Weiwen · 2025-07-21T15:19:46Z

test/quantization/pt2e/test_x86inductor_fusion.py

+    @parametrize("dtype", [torch.float32, torch.bfloat16])
+    @parametrize("input_dim_exceeds_two", [True, False])
+    @parametrize("check_reuse_input", [True, False])
+    def test_fp8_qlinear(


What's the difference between this test case and the test_qlinear_cpu above?

Xia-Weiwen · 2025-07-21T15:31:56Z

torchao/quantization/pt2e/inductor_passes/x86.py

+    for is_fp8 in [True, False]:
+        for original_pattern_output_dtype in [torch.float32, torch.bfloat16]:
+            is_bf16 = original_pattern_output_dtype == torch.bfloat16
+            for x_scale_zp_are_tensors in (False, True):


Use itertools.product maybe?

shiyang-weng added 20 commits June 18, 2025 15:22

quantize_affine_float8/dequantize_affine_float8 not decomposed on ind…

a840ef5

…uctor

remove redundant unittest.skipIf

02d045b

fix rebase issue

9860c56

change dispatch key to a flag decomposed

ca662f3

support scaled_mm on inductor

f51a5be

fix rebase issue

719793c

support dequant promtion for fp8

48a3d99

add ut

1921b2f

remove redundant codes

0335415

Merge pull request #2 from shiyang-weng/wengshiy/dequant_promotion

955fa6e

Add fp8 dequant promotion

Merge remote-tracking branch 'origin/main' into wengshiy/scaled_mm

a70e094

fix lint

a5bb4d0

Merge branch 'main' into wengshiy/scaled_mm

1c1f890

resolve conflict

0c7f8ea

change to use qlinear

0175b17

add ut

564d4b7

fix lint

9948674

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

413a883

support fp8 quant_lift_up

558d216

add reshape into _VIEW_METHOD_OPS

8cd1433

shiyang-weng marked this pull request as draft July 17, 2025 02:59

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2025

shiyang-weng commented Jul 17, 2025

View reviewed changes

test/quantization/pt2e/test_x86inductor_fusion.py Show resolved Hide resolved

shiyang-weng added 6 commits July 17, 2025 09:53

add quant_input_check

ae4f582

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

469ac50

fix lint

8026306

refine ut

f735949

remove fp8 dynamic quant ut

5803511

fix output_scale issue

3e37dea

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

497de92

shiyang-weng commented Jul 21, 2025

View reviewed changes

Xia-Weiwen reviewed Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][float8] Support qlinear for float8 in inductor #2565

[Inductor][float8] Support qlinear for float8 in inductor #2565

shiyang-weng commented Jul 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 17, 2025

Uh oh!

Uh oh!

shiyang-weng Jul 21, 2025

Uh oh!

Xia-Weiwen left a comment

Uh oh!

Xia-Weiwen Jul 21, 2025

Uh oh!

Xia-Weiwen Jul 21, 2025

Uh oh!

Xia-Weiwen Jul 21, 2025

Uh oh!

Xia-Weiwen Jul 21, 2025

Uh oh!

Xia-Weiwen Jul 21, 2025

Uh oh!

Uh oh!

[Inductor][float8] Support qlinear for float8 in inductor #2565

Are you sure you want to change the base?

[Inductor][float8] Support qlinear for float8 in inductor #2565

Conversation

shiyang-weng commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2565

Uh oh!

Uh oh!

shiyang-weng Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shiyang-weng commented Jul 17, 2025 •

edited

Loading