Hopper-1.8.3 image incompatible with CUDA 13.0 - Floating point exception on H20

### System Info

Environment:
- CUDA Driver Version: 550.105.08
- CUDA Version: 13.0
- GPU: NVIDIA H20
- GPU Memory: 97871 MiB (~96 GB)
- Platform: OpenShift/Kubernetes
- TEI Version: 1.8.3
- TEI Image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.8.3
- Model: Qwen/Qwen3-Embedding-8B

Launch Parameters:
--model-id Qwen/Qwen3-Embedding-8B
--pooling mean
--max-batch-requests 128
--max-concurrent-requests 256
--max-batch-tokens 40960
--dtype float16

### Information

- [x] Docker
- [x] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

1. Pull and run the TEI hopper image on a system with CUDA 13.0:
   docker run --gpus all -p 80:80 \
     ghcr.io/huggingface/text-embeddings-inference:hopper-1.8.3 \
     --model-id Qwen/Qwen3-Embedding-8B \
     --pooling mean \
     --max-batch-requests 128 \
     --max-concurrent-requests 256 \
     --max-batch-tokens 40960 \
     --dtype float16

2. Observe the initialization logs:
   - Container starts and text embeddings router initializes
   - Attempts to load model: Qwen/Qwen3-Embedding-8B
   - WARNING appears: "Could not find a Sentence Transformers config"
   - INFO: "Maximum number of tokens per request: 40960"
   - INFO: "Starting 8 tokenization workers"
   - INFO: "Starting model backend"
   - INFO: "Starting FlashOwn3 model on Cuda(CudaDevice(DeviceId(1)))"
   - CRASH: "Floating point exception (core dumped)"

3. Check GPU status:
   nvidia-smi
   Result: Shows 0% GPU utilization, 0 MiB memory usage, no running processes

4. Result: Container fails to serve embeddings, GPU remains unused

Note: The same model works correctly with the standard CUDA 12 variant:
docker run --gpus all -p 80:80 \
  ghcr.io/huggingface/text-embeddings-inference:1.8.3 \
  --model-id Qwen/Qwen3-Embedding-8B \
  --pooling mean

### Expected behavior

Container should start successfully without crashes

Questions:
1. Is CUDA 13.x support planned for the hopper variant?
2. Can a hopper-cuda13 image variant be provided?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hopper-1.8.3 image incompatible with CUDA 13.0 - Floating point exception on H20 #791

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hopper-1.8.3 image incompatible with CUDA 13.0 - Floating point exception on H20 #791

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions