-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Hello Intel Extension for TensorFlow Team,
I am unable to get TensorFlow to detect my Intel Arc B580 GPU when running inside the official Docker container. The software inside the container appears to load correctly, but the SYCL runtime fails to find any physical XPU devices.
This seems to be a low-level incompatibility issue with the Ubuntu 24.04 host environment.
Environment Details
Host OS: Ubuntu 24.04 LTS "Noble Numbat"
GPU: Intel Arc B580
Host Graphics Driver: xe DRM driver (verified via dmesg, graphical acceleration is working on the host via X.org)
Docker Version: (Cole aqui a versão que você obteve no passo 1)
Docker Image: intel/intel-optimized-tensorflow:2.15.0.1-xpu-pip-jupyter
Steps to Reproduce
On a fresh Ubuntu 24.04 host system, install Docker via sudo apt install docker.io.
Add the current user to the docker group with sudo usermod -aG docker $USER and then log out and log back in.
Run the official Docker container with flags to pass through the GPU device and render group permissions:
Bash
docker run -it --rm -p 8888:8888 --device=/dev/dri --group-add=$(getent group render | cut -d: -f3) intel/intel-optimized-tensorflow:2.15.0.1-xpu-pip-jupyter
Inside the container's terminal (e.g., via Jupyter Lab), run the following Python command to check for XPU devices:
Python
import tensorflow as tf
import intel_extension_for_tensorflow as itex
print('Devices found:', tf.config.list_physical_devices('XPU'))
Expected Behavior
The Python script should output a success message, listing one XPU device:
Devices found: [PhysicalDevice(name='/physical_device:XPU:0', device_type='XPU')]
Actual Behavior
The Python script outputs an empty list: Devices found: [].
The full log from the Python execution inside the container is as follows:
2025-09-07 00:55:39.394092: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-07 00:55:39.439004: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-07 00:55:39.439039: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-07 00:55:39.440414: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-07 00:55:39.447415: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-07 00:55:39.447651: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-07 00:55:40.280102: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-09-07 00:55:41.486461: I itex/core/wrapper/itex_gpu_wrapper.cc:38] Intel Extension for Tensorflow* GPU backend is loaded.
2025-09-07 00:55:41.486887: I external/local_xla/xla/pjrt/pjrt_api.cc:67] PJRT_Api is set for device type xpu
2025-09-07 00:55:41.486911: I external/local_xla/xla/pjrt/pjrt_api.cc:72] PJRT plugin for XPU has PJRT API version 0.33. The framework PJRT API version is 0.34.
2025-09-07 00:55:41.525929: E external/intel_xla/xla/stream_executor/sycl/sycl_gpu_runtime.cc:178] Can not found any devices.
2025-09-07 00:55:41.525996: E itex/core/kernels/xpu_kernel.cc:60] Failed precondition: No visible XPU devices. To check runtime environment on your host, please run itex/tools/python/env_check.py.
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
2025-09-07 00:55:41.613297: E itex/core/devices/gpu/itex_gpu_runtime.cc:174] Can not found any devices. To check runtime environment on your host, please run itex/tools/python/env_check.py.
If you need help, create an issue at https://github.com/intel/intel-extension-for-tensorflow/issues
Thank you for looking into this. It seems there is an incompatibility between the latest Ubuntu LTS release and the device passthrough for Intel Arc GPUs.