Could not start backend: cannot find tensor embeddings.word_embeddings.weight

### System Info

docker
```
docker run \
        -d \
        --name reranker \
        --gpus '"device=0"' \
        --env CUDA_VISIBLE_DEVICES=0 \
        -p 7863:80 \
        -v /data/ai/models:/data \
        ghcr.io/huggingface/text-embeddings-inference:86-1.5 \
        --model-id "/data/bge-reranker-base" \
        --dtype "float16" \
        --max-concurrent-requests 2048 \
        --max-batch-tokens 32768000 \
        --max-batch-requests 128 \
        --max-client-batch-size 4096 \
        --auto-truncate \
        --tokenization-workers 64 \
        --payload-limit 16000000
```


nvidia-smi 
```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:5E:00.0 Off |                  N/A |
| 42%   22C    P8             17W /  350W |   24237MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
```

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

docker run \
        -d \
        --name reranker \
        --gpus '"device=0"' \
        --env CUDA_VISIBLE_DEVICES=0 \
        -p 7863:80 \
        -v /data/ai/models:/data \
        ghcr.io/huggingface/text-embeddings-inference:86-1.5 \
        --model-id "/data/bge-reranker-base" \
        --dtype "float16" \
        --max-concurrent-requests 2048 \
        --max-batch-tokens 32768000 \
        --max-batch-requests 128 \
        --max-client-batch-size 4096 \
        --auto-truncate \
        --tokenization-workers 64 \
        --payload-limit 16000000

### Expected behavior

  It was still running normally before, until I encountered the context was too long, and then I couldn't successfully restart the model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could not start backend: cannot find tensor embeddings.word_embeddings.weight #533

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could not start backend: cannot find tensor embeddings.word_embeddings.weight #533

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions