Intel Neural Compressor Release 3.7

Latest

Latest

thuang6 released this 25 Dec 03:50

· 4 commits to master since this release

47e12f1

Highlights
Features
Improvements
Validated Hardware
Validated Configurations

Highlights

Introduced NVFP4 quantization experimental support and mixed bits (MXFP4 & MXFP8) autotuning on LLMs

Features

Support NVFP4 Post-Training Quantization (PTQ) on LLM models (experimental)
Support mixed bits (MXFP4 & MXFP8) autotuning on LLM models (experimental)
Support MXFP8 PTQ on video generation diffusion model (experimental)
Support MXFP4 Quantization-Aware Training (QAT) on LLM models (experimental)

Improvements

Update of Llama 3 series example for NVFP4 and auto mixed-bits (MXFP4 & MXFP8) PTQ
New LLM example (DeepSeek R1) for MXFP8, MXFP4, NVFP4 PTQ
New LLM example (Qwen3-235B ) for MXFP8, MXFP4 PTQ
New video generation diffusion example (FramePack) for MXFP8 PTQ
Update of Llama3 example for MXFP4 QAT
Removal of test purpose benchmarking feature for security consideration

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th, 6th Gen)
Intel Core Ultra Processors (Series 1 and 2)
Intel Data Center GPU Max Series (1550)
Intel® Arc™ B-Series Graphics GPU (B580 and B60)

Validated Configurations

Ubuntu 24.04 & Win 11
Python 3.10, 3.11, 3.12, 3.13
PyTorch/IPEX 2.7, 2.8
PyTorch 2.9

Assets 2