- Highlights
- Features
- Improvements
- Validated Hardware
- Validated Configurations
Highlights
- Introduced NVFP4 quantization experimental support and mixed bits (MXFP4 & MXFP8) autotuning on LLMs
Features
- Support NVFP4 Post-Training Quantization (PTQ) on LLM models (experimental)
- Support mixed bits (MXFP4 & MXFP8) autotuning on LLM models (experimental)
- Support MXFP8 PTQ on video generation diffusion model (experimental)
- Support MXFP4 Quantization-Aware Training (QAT) on LLM models (experimental)
Improvements
- Update of Llama 3 series example for NVFP4 and auto mixed-bits (MXFP4 & MXFP8) PTQ
- New LLM example (DeepSeek R1) for MXFP8, MXFP4, NVFP4 PTQ
- New LLM example (Qwen3-235B ) for MXFP8, MXFP4 PTQ
- New video generation diffusion example (FramePack) for MXFP8 PTQ
- Update of Llama3 example for MXFP4 QAT
- Removal of test purpose benchmarking feature for security consideration
Validated Hardware
- Intel Gaudi Al Accelerators (Gaudi 2 and 3)
- Intel Xeon Scalable processor (4th, 5th, 6th Gen)
- Intel Core Ultra Processors (Series 1 and 2)
- Intel Data Center GPU Max Series (1550)
- Intel® Arc™ B-Series Graphics GPU (B580 and B60)
Validated Configurations
- Ubuntu 24.04 & Win 11
- Python 3.10, 3.11, 3.12, 3.13
- PyTorch/IPEX 2.7, 2.8
- PyTorch 2.9