Skip to content

Intel Neural Compressor Release 3.4

Latest
Compare
Choose a tag to compare
@thuang6 thuang6 released this 23 May 12:09
· 4 commits to master since this release
  • Highlights
  • Features
  • Improvements
  • Bug Fixes
  • Validated Hardware
  • Validated Configurations

Highlights

  • Aligned Gaudi SW Release 1.21 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
  • INT4 quantization enhancements for Intel CPU/GPU

Features

  • Support expert parallelism for Mixtral model on Gaudi
  • Enhance multi-cards FP8 model save and load on Gaudi
  • Enable static FP8 quantization of DeepSeek V3/R1 model on Gaudi
  • Support W4A8 mixed precision on Gaudi (experimental)
  • Improve compile time on Gaudi when using FP8 (experimental)

Improvements

  • Remove numpy version limit for 3.x PyTorch package

Bug Fixes

  • Fix graph compile error when quantizing Llama3.2 11B/90B vision model on Gaudi
  • Fix segmentation fault issue in LLama2-70B INT4 model on Intel GPU
  • Fix accuracy issue caused by duplicated g_idx update for INT4 model on Intel GPU

Validated Hardware

  • Intel Gaudi Al Accelerators (Gaudi 2 and 3)
  • Intel Xeon Scalable processor (4th, 5th, 6th Gen)
  • Intel Core Ultra Processors (Series 1 and 2)
  • Intel Data Center GPU Max Series (1100)
  • Intel® Arc™ B-Series Graphics GPU (B580)

Validated Configurations

  • Centos 8.4 & Ubuntu 24.04 & Win 11
  • Python 3.9, 3.10, 3.11, 3.12
  • PyTorch/IPEX 2.4, 2.5, 2.6