- Highlights
- Features
- Improvements
- Bug Fixes
- Validated Hardware
- Validated Configurations
Highlights
- Aligned Gaudi SW Release 1.21 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
- INT4 quantization enhancements for Intel CPU/GPU
Features
- Support expert parallelism for Mixtral model on Gaudi
- Enhance multi-cards FP8 model save and load on Gaudi
- Enable static FP8 quantization of DeepSeek V3/R1 model on Gaudi
- Support W4A8 mixed precision on Gaudi (experimental)
- Improve compile time on Gaudi when using FP8 (experimental)
Improvements
- Remove numpy version limit for 3.x PyTorch package
Bug Fixes
- Fix graph compile error when quantizing Llama3.2 11B/90B vision model on Gaudi
- Fix segmentation fault issue in LLama2-70B INT4 model on Intel GPU
- Fix accuracy issue caused by duplicated g_idx update for INT4 model on Intel GPU
Validated Hardware
- Intel Gaudi Al Accelerators (Gaudi 2 and 3)
- Intel Xeon Scalable processor (4th, 5th, 6th Gen)
- Intel Core Ultra Processors (Series 1 and 2)
- Intel Data Center GPU Max Series (1100)
- Intel® Arc™ B-Series Graphics GPU (B580)
Validated Configurations
- Centos 8.4 & Ubuntu 24.04 & Win 11
- Python 3.9, 3.10, 3.11, 3.12
- PyTorch/IPEX 2.4, 2.5, 2.6