Float16 inference fails with Llama-3.1-8B-Instruct on EMR (Intel Xeon 8592+) with IPEX enabled

I'm encountering an issue while trying to run inference on the meta-llama/Llama-3.1-8B-Instruct model using the benchmarking script provided in the repo. Here's my setup:
 
**Environment:**

- Env is created using instructions provided in intel-extension-for-pytorch/examples/cpu/llm/README.md
- **Model:** meta-llama/Llama-3.1-8B-Instruct
- **Script:** intel-extension-for-pytorch/examples/cpu/llm/inference/run.py
- **Hardware:** Intel(R) Xeon(R) Platinum 8592+ (EMR machine)
- **Command:**
    python run.py --benchmark -m meta-llama/Llama-3.1-8B-Instruct --dtype float16 --max-new-tokens 1024 --input-tokens 128 --num-warmup 2 --batch-size 32 --num-iter 1

**Issue:** When I run the above command with --dtype float16 and IPEX enabled, I get the following error:

**RuntimeError: could not create a primitive descriptor for the inner product forward propagation primitive.**

However, if I remove the **--ipex** flag, the script runs without crashing. But the verbose logs show that the source and destination weights are still in bf16, not float16, as expected:

`onednn_verbose,v1,primitive,exec,cpu,matmul,brg_matmul:avx10_1_512_amx,undef,src:bf16::blocked:ab::f0 wei:bf16::blocked:ba::f0 dst:bf16::blocked:ab::f0,attr-scratchpad:user,,15232x4096:4096x14336,129.611`
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Float16 inference fails with Llama-3.1-8B-Instruct on EMR (Intel Xeon 8592+) with IPEX enabled #857

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Float16 inference fails with Llama-3.1-8B-Instruct on EMR (Intel Xeon 8592+) with IPEX enabled #857

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions