Add support for fbgemm int4 mm kernel #2255

jerryzh168 · 2025-05-23T20:07:25Z

Summary:
we also plan to expose some other kernels like fp8xint4 and bf16xfp8, fp8xfp8 to compare with existing torchao kernels

Test Plan:
test/dtypes/test_fbgemm_int4_tensor.py

H100, with compile:

TODO: update

	overall tokens/sec	TTFT	Peak Memory	Model Size
baseline - 1	131.65	0.0220	16.24 GB	15.01 GB
baseline - 128	76.38	0.0544	26.92 GB	15.01 GB
int4wo - 1	207.69	0.0288	6.41 GB	3.99 GB
int4wo - 128	12.85	0.4223	16.01 GB	3.99 GB
fbgemm-int4 - 1 (no compile)	40.00	0.0508	29.03 GB	4.22 GB
fbgemm-int4 - 128 (no compile)	11.46	0.0846	28.96 GB	4.22 GB

export CHECKPOINT_PATH=../../../checkpoints # path to checkpoints folder
export MODEL_REPO=meta-llama/Meta-Llama-3.1-8B-Instruct
# default batch size 1
python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --write_result benchmark_results.txt
python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --quantization int4wo-128 --write_result benchmark_results.txt
python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization fbgemm-int4-128 --write_result benchmark_results.txt

python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --write_result benchmark_results.txt --batch_size 128
python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --quantization int4wo-128 --write_result benchmark_results.txt --batch_size 128
python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization fbgemm-int4-128 --write_result benchmark_results.txt --batch_size 128

Note: fbgemm-int4-128 does not work with compile yet since the fbgemm op does not have meta device implementation.

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: we also plan to expose some other kernels like fp8xint4 and bf16xfp8, fp8xfp8 to compare with existing torchao kernels Test Plan: test/dtypes/test_fbgemm_int4_tensor.py Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-05-23T20:07:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2255

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

samanamp · 2025-05-23T21:41:02Z

Thank you! community really needs this.

Add support for fbgemm int4 mm kernel

303526f

Summary: we also plan to expose some other kernels like fp8xint4 and bf16xfp8, fp8xfp8 to compare with existing torchao kernels Test Plan: test/dtypes/test_fbgemm_int4_tensor.py Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 23, 2025

jerryzh168 added 2 commits May 23, 2025 16:30

fix and test

22e0eba

fix dtype

9df9b49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for fbgemm int4 mm kernel #2255

Add support for fbgemm int4 mm kernel #2255

Uh oh!

jerryzh168 commented May 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 23, 2025 •

edited

Loading

Uh oh!

samanamp commented May 23, 2025

Uh oh!

Uh oh!

Add support for fbgemm int4 mm kernel #2255

Are you sure you want to change the base?

Add support for fbgemm int4 mm kernel #2255

Uh oh!

Conversation

jerryzh168 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2255

Uh oh!

samanamp commented May 23, 2025

Uh oh!

Uh oh!

jerryzh168 commented May 23, 2025 •

edited

Loading

pytorch-bot bot commented May 23, 2025 •

edited

Loading