Skip to content

Hopper-1.8.3 image incompatible with CUDA 13.0 - Floating point exception on H20 #791

@khalidnass

Description

@khalidnass

System Info

Environment:

  • CUDA Driver Version: 550.105.08
  • CUDA Version: 13.0
  • GPU: NVIDIA H20
  • GPU Memory: 97871 MiB (~96 GB)
  • Platform: OpenShift/Kubernetes
  • TEI Version: 1.8.3
  • TEI Image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.8.3
  • Model: Qwen/Qwen3-Embedding-8B

Launch Parameters:
--model-id Qwen/Qwen3-Embedding-8B
--pooling mean
--max-batch-requests 128
--max-concurrent-requests 256
--max-batch-tokens 40960
--dtype float16

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Pull and run the TEI hopper image on a system with CUDA 13.0:
    docker run --gpus all -p 80:80
    ghcr.io/huggingface/text-embeddings-inference:hopper-1.8.3
    --model-id Qwen/Qwen3-Embedding-8B
    --pooling mean
    --max-batch-requests 128
    --max-concurrent-requests 256
    --max-batch-tokens 40960
    --dtype float16

  2. Observe the initialization logs:

    • Container starts and text embeddings router initializes
    • Attempts to load model: Qwen/Qwen3-Embedding-8B
    • WARNING appears: "Could not find a Sentence Transformers config"
    • INFO: "Maximum number of tokens per request: 40960"
    • INFO: "Starting 8 tokenization workers"
    • INFO: "Starting model backend"
    • INFO: "Starting FlashOwn3 model on Cuda(CudaDevice(DeviceId(1)))"
    • CRASH: "Floating point exception (core dumped)"
  3. Check GPU status:
    nvidia-smi
    Result: Shows 0% GPU utilization, 0 MiB memory usage, no running processes

  4. Result: Container fails to serve embeddings, GPU remains unused

Note: The same model works correctly with the standard CUDA 12 variant:
docker run --gpus all -p 80:80
ghcr.io/huggingface/text-embeddings-inference:1.8.3
--model-id Qwen/Qwen3-Embedding-8B
--pooling mean

Expected behavior

Container should start successfully without crashes

Questions:

  1. Is CUDA 13.x support planned for the hopper variant?
  2. Can a hopper-cuda13 image variant be provided?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions