Intel Neural Compressor Release 3.4

Latest

Latest

thuang6 released this 23 May 12:09

· 4 commits to master since this release

4e0ef30

Highlights
Features
Improvements
Bug Fixes
Validated Hardware
Validated Configurations

Highlights

Aligned Gaudi SW Release 1.21 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
INT4 quantization enhancements for Intel CPU/GPU

Features

Support expert parallelism for Mixtral model on Gaudi
Enhance multi-cards FP8 model save and load on Gaudi
Enable static FP8 quantization of DeepSeek V3/R1 model on Gaudi
Support W4A8 mixed precision on Gaudi (experimental)
Improve compile time on Gaudi when using FP8 (experimental)

Improvements

Remove numpy version limit for 3.x PyTorch package

Bug Fixes

Fix graph compile error when quantizing Llama3.2 11B/90B vision model on Gaudi
Fix segmentation fault issue in LLama2-70B INT4 model on Intel GPU
Fix accuracy issue caused by duplicated g_idx update for INT4 model on Intel GPU

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th, 6th Gen)
Intel Core Ultra Processors (Series 1 and 2)
Intel Data Center GPU Max Series (1100)
Intel® Arc™ B-Series Graphics GPU (B580)

Validated Configurations

Centos 8.4 & Ubuntu 24.04 & Win 11
Python 3.9, 3.10, 3.11, 3.12
PyTorch/IPEX 2.4, 2.5, 2.6

Assets 2