Fail to run model on CPU using IPEX-XPU installation


We want to use speculative decoding where one model runs on the xpu and another model (significantly smaller) runs on the cpu.
We installed xpu build and run the script from [https://github.com/intel/intel-extension-for-pytorch/tree/release/2.6/examples/cpu/llm/inference](https://github.com/intel/intel-extension-for-pytorch/tree/release/2.6/examples/cpu/llm/inference) :

python run.py --benchmark -m microsoft/Phi-3-mini-4k-instruct --input-tokens 1024 --max-new-tokens 128 --token-latency --dtype float32 --ipex

**And we get the following error:**

Traceback (most recent call last):
  File "C:\Users\sdp\shira\single_instance\run_generation.py", line 301, in <module>
    model = ipex.llm.optimize(
            ^^^^^^^^^^^^^^^^^^
  File "C:\Users\sdp\miniforge3\envs\shira-ipex\Lib\site-packages\intel_extension_for_pytorch\transformers\optimize.py", line 2157, in optimize
    validate_device_avaliable(device)
  File "C:\Users\sdp\miniforge3\envs\shira-ipex\Lib\site-packages\intel_extension_for_pytorch\transformers\optimize.py", line 1918, in validate_device_avaliable
    error_message(device)
  File "C:\Users\sdp\miniforge3\envs\shira-ipex\Lib\site-packages\intel_extension_for_pytorch\transformers\optimize.py", line 1909, in error_message
    raise RuntimeError(
RuntimeError: Device [cpu] is not avaliable in your IPEX package, need to re-install IPEX with [cpu] support, exiting...
LLM RUNTIME ERROR: Running generation task failed. Quit.
 

Can we get support for CPU using XPU build?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail to run model on CPU using IPEX-XPU installation #815

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to run model on CPU using IPEX-XPU installation #815

Description

Activity

jingxu10 commented on Apr 25, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions