Skip to content

Commit 54b2125

Browse files
committed
IPEX 2.7 release notes and known issues (#5568)
1 parent 8f077ab commit 54b2125

File tree

2 files changed

+44
-36
lines changed

2 files changed

+44
-36
lines changed

docs/tutorials/known_issues.md

Lines changed: 9 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Troubleshooting
44
## General Usage
55

66
- **Problem**: FP64 data type is unsupported on current platform.
7-
- **Cause**: FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) and [Intel® Arc™ A-Series Graphics](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html) platforms.
7+
- **Cause**: FP64 is not natively supported by the [Intel® Arc™ A-Series Graphics](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html) platforms.
88
If you run any AI workload on that platform and receive this error message, it means a kernel requires FP64 instructions that are not supported and the execution is stopped.
99
- **Problem**: Runtime error `invalid device pointer` if `import horovod.torch as hvd` before `import intel_extension_for_pytorch`.
1010
- **Cause**: Intel® Optimization for Horovod\* uses utilities provided by Intel® Extension for PyTorch\*. The improper import order causes Intel® Extension for PyTorch\* to be unloaded before Intel®
@@ -23,12 +23,6 @@ Troubleshooting
2323
- **Problem**: Some workloads terminate with an error `CL_DEVICE_NOT_FOUND` after some time on WSL2.
2424
- **Cause**: This issue is due to the [TDR feature](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys#tdrdelay) on Windows.
2525
- **Solution**: Try increasing TDRDelay in your Windows Registry to a large value, such as 20 (it is 2 seconds, by default), and reboot.
26-
- **Problem**: RuntimeError: Can't add devices across platforms to a single context. -33 (PI_ERROR_INVALID_DEVICE).
27-
- **Cause**: If you run Intel® Extension for PyTorch\* in a Windows environment where Intel® discrete GPU and integrated GPU co-exist, and the integrated GPU is not supported by Intel® Extension for PyTorch\* but is wrongly identified as the first GPU platform.
28-
- **Solution**: Disable the integrated GPU in your environment to work around. For long term, Intel® Graphics Driver will always enumerate the discrete GPU as the first device so that Intel® Extension for PyTorch\* could provide the fastest device to end framework users in such co-exist scenario based on that.
29-
- **Problem**: RuntimeError: Failed to load the backend extension: intel_extension_for_pytorch. You can disable extension auto-loading with TORCH_DEVICE_BACKEND_AUTOLOAD=0.
30-
- **Cause**: If you import any third party library such as Transformers before `import torch`, and the third party library has dependency to torch and then implicitly autoloads intel_extension_for_pytorch, which introduces circle import.
31-
- **Solution**: Disable extension auto-loading with TORCH_DEVICE_BACKEND_AUTOLOAD=0.
3226

3327
## Library Dependencies
3428

@@ -84,13 +78,8 @@ Troubleshooting
8478
```
8579

8680
- **Problem**: If you encounter issues Runtime error related to C++ compiler with `torch.compile`. Runtime Error: Failed to find C++ compiler. Please specify via CXX environment variable.
87-
- **Cause**: Not install and activate DPC++/C++ Compiler correctly.
88-
- **Solution**: [Install DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html) and activate it by following commands.
89-
90-
```bash
91-
# {dpcpproot} is the location for dpcpp ROOT path and it is where you installed oneAPI DPCPP, usually it is /opt/intel/oneapi/compiler/latest or ~/intel/oneapi/compiler/latest
92-
source {dpcpproot}/env/vars.sh
93-
```
81+
- **Cause**: Not activate C++ compiler. `torch.compile` need to find correct `cl.exe` path.
82+
- **Solution**: One could open "Developer Command Prompt for VS 2022" or follow [Visual Studio Developer Command Prompt and Developer PowerShell](https://learn.microsoft.com/en-us/visualstudio/ide/reference/command-prompt-powershell?view=vs-2022#developer-command-prompt) to activate visual studio environment.
9483

9584
- **Problem**: LoweringException: ImportError: cannot import name 'intel' from 'triton._C.libtriton'
9685
- **Cause**: Installing Triton causes pytorch-triton-xpu to stop working.
@@ -102,32 +91,16 @@ Troubleshooting
10291
pip uninstall triton
10392
pip uninstall pytorch-triton-xpu
10493
# Reinstall correct version of pytorch-triton-xpu
105-
pip install --pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu
106-
```
107-
108-
- **Problem**: ERROR: can not install dpcpp-cpp-rt and torch==2.6.0 because these packages version has conflicting dependencies.
109-
- **Cause**: The intel-extension-for-pytorch v2.6.10+xpu uses Intel DPC++ Compiler 2025.0.4 to get a crucial bug fix in unified runtime, while torch v2.6.0+xpu is pinned with 2025.0.2, so we can not install PyTorch and intel-extension-for-pytorch in one pip installation command.
110-
- **Solution**: Install PyTorch and intel-extension-for-pytorch with seperate commands.
94+
pip install pytorch-triton-xpu==3.3.0 --index-url https://download.pytorch.org/whl/xpu
11195
```
112-
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
113-
python -m pip install intel-extension-for-pytorch==2.6.10+xpu oneccl_bind_pt==2.6.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
114-
```
115-
116-
- **Problem**: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
117-
118-
```
119-
torch 2.6.0+xpu requires intel-cmplr-lib-rt==2025.0.2, but you have intel-cmplr-lib-rt 2025.0.4 which is incompatible.
120-
torch 2.6.0+xpu requires intel-cmplr-lib-ur==2025.0.2, but you have intel-cmplr-lib-ur 2025.0.4 which is incompatible.
121-
torch 2.6.0+xpu requires intel-cmplr-lic-rt==2025.0.2, but you have intel-cmplr-lic-rt 2025.0.4 which is incompatible.
122-
torch 2.6.0+xpu requires intel-sycl-rt==2025.0.2, but you have intel-sycl-rt 2025.0.4 which is incompatible.
123-
```
124-
125-
- **Cause**: The intel-extension-for-pytorch v2.6.10+xpu uses Intel DPC++ Compiler 2025.0.4 to get a crucial bug fix in unified runtime, while torch v2.6.0+xpu is pinned with 2025.0.2.
126-
- **Solution**: Ignore the Error since actually torch v2.6.0+xpu is compatible with Intel Compiler 2025.0.4.
12796

12897
- **Problem**: RuntimeError: oneCCL: ze_handle_manager.cpp:226 get_ptr: EXCEPTION: unknown memory type, when executing DLRMv2 BF16 training on 4 cards Intel® Data Center GPU Max platform.
12998
- **Cause**: Issue exists in the default sycl path of oneCCL 2021.14 which uses two IPC exchanges.
130-
- **Solution**: Use `export CCL_ATL_TRANSPORT=ofi` to work around.
99+
- **Solution**: Use `export CCL_ATL_TRANSPORT=ofi` to work around.
100+
101+
- **Problem**: Segmentation fault, when executing LLaMa2-70B inference on Intel® Data Center GPU Max platform, base on online quantization.
102+
- **Cause**: Issue exists Intel Neural Compressor (INC) v3.3: during the initial import of INC, the accelerator is cached with `lru_cache`. Subsequently, setting `INC_TARGET_DEVICE` in INC transformers-like API does not take effect. This results in two devices being present in the model, leading to memory-related errors as seen in the error messages.
103+
- **Solution**: Run the workload `INC_TARGET_DEVICE="cpu" python` to work around, if using online quantization.
131104

132105
## Performance Issue
133106

docs/tutorials/releases.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,41 @@
11
Releases
22
=============
33

4+
## 2.7.10+xpu
5+
6+
Intel® Extension for PyTorch\* v2.7.10+xpu is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch\* 2.7.0.
7+
8+
### Highlights
9+
10+
- Intel® oneDNN v3.7.1 integration
11+
12+
- Large Language Model (LLM) optimization
13+
14+
Intel® Extension for PyTorch* optimizes typical LLM models like Llama 2, Llama 3, Phi-3-mini, Qwen2, and GLM-4 on the Intel® Arc™ Graphics family. Moreover, new LLM inference models such as Llama 3.3, Phi-3.5-mini, Qwen2.5, and Mistral-7B are also optimized on Intel® Data Center GPU Max Series platforms compared to the previous release. A full list of optimized models can be found in the [LLM Optimizations Overview](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/llm.html), with supported transformer version updates to [4.48.3](https://github.com/huggingface/transformers/releases/tag/v4.48.3).
15+
16+
- Serving framework support
17+
18+
Intel® Extension for PyTorch\* offers extensive support for various ecosystems, including [vLLM](https://github.com/vllm-project/vllm) and [TGI](https://github.com/huggingface/text-generation-inference), with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features, such as chunked prefill and MoE (Mixture of Experts), are supported by the backend kernels provided in Intel® Extension for PyTorch*. In this release, Intel® Extension for PyTorch\* adds sliding windows support in `ipex.llm.modules.PagedAttention.flash_attn_varlen_func` to meet the need of models like Phi3, and Mistral, which enable sliding window support by default.
19+
20+
- [Prototype] QLoRA/LoRA finetuning using BitsAndBytes
21+
22+
Intel® Extension for PyTorch* supports QLoRA/LoRA finetuning with [BitsAndBytes](https://github.com/bitsandbytes-foundation/bitsandbytes) on Intel® GPU platforms. This release includes several enhancements for better performance and functionality:
23+
- The performance of the NF4 dequantize kernel has been improved by approximately 4.4× to 5.6× across different shapes compared to the previous release.
24+
- `_int_mm` support in INT8 has been added to enable INT8 LoRA finetuning in PEFT (with float optimizers like `adamw_torch`).
25+
26+
- Codegen support removal
27+
28+
Removes codegen support from Intel® Extension for PyTorch\* and reuses the codegen capability from [Torch XPU Operators](https://github.com/intel/torch-xpu-ops), to ensure interoperability of code change in codegen with usages in Intel® Extension for PyTorch\*.
29+
30+
- [Prototype] Python 3.13t support
31+
32+
Adds prototype support for Python 3.13t and provides prebuilt binaries on the [download server](https://pytorch-extension.intel.com/release-whl/stable/xpu/us/).
33+
34+
### Known Issues
35+
36+
Please refer to [Known Issues webpage](./known_issues.md).
37+
38+
439
## 2.6.10+xpu
540

641
Intel® Extension for PyTorch\* v2.6.10+xpu is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Flex Series) based on PyTorch* 2.6.0.

0 commit comments

Comments
 (0)