Skip to content

Could not start backend: cannot find tensor embeddings.word_embeddings.weightΒ #533

Open
@momomobinx

Description

@momomobinx

System Info

docker

docker run \
        -d \
        --name reranker \
        --gpus '"device=0"' \
        --env CUDA_VISIBLE_DEVICES=0 \
        -p 7863:80 \
        -v /data/ai/models:/data \
        ghcr.io/huggingface/text-embeddings-inference:86-1.5 \
        --model-id "/data/bge-reranker-base" \
        --dtype "float16" \
        --max-concurrent-requests 2048 \
        --max-batch-tokens 32768000 \
        --max-batch-requests 128 \
        --max-client-batch-size 4096 \
        --auto-truncate \
        --tokenization-workers 64 \
        --payload-limit 16000000

nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:5E:00.0 Off |                  N/A |
| 42%   22C    P8             17W /  350W |   24237MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

docker run
-d
--name reranker
--gpus '"device=0"'
--env CUDA_VISIBLE_DEVICES=0
-p 7863:80
-v /data/ai/models:/data
ghcr.io/huggingface/text-embeddings-inference:86-1.5
--model-id "/data/bge-reranker-base"
--dtype "float16"
--max-concurrent-requests 2048
--max-batch-tokens 32768000
--max-batch-requests 128
--max-client-batch-size 4096
--auto-truncate
--tokenization-workers 64
--payload-limit 16000000

Expected behavior

It was still running normally before, until I encountered the context was too long, and then I couldn't successfully restart the model

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions