Open
Description
System Info
docker
docker run \
-d \
--name reranker \
--gpus '"device=0"' \
--env CUDA_VISIBLE_DEVICES=0 \
-p 7863:80 \
-v /data/ai/models:/data \
ghcr.io/huggingface/text-embeddings-inference:86-1.5 \
--model-id "/data/bge-reranker-base" \
--dtype "float16" \
--max-concurrent-requests 2048 \
--max-batch-tokens 32768000 \
--max-batch-requests 128 \
--max-client-batch-size 4096 \
--auto-truncate \
--tokenization-workers 64 \
--payload-limit 16000000
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142 Driver Version: 550.142 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:5E:00.0 Off | N/A |
| 42% 22C P8 17W / 350W | 24237MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
docker run
-d
--name reranker
--gpus '"device=0"'
--env CUDA_VISIBLE_DEVICES=0
-p 7863:80
-v /data/ai/models:/data
ghcr.io/huggingface/text-embeddings-inference:86-1.5
--model-id "/data/bge-reranker-base"
--dtype "float16"
--max-concurrent-requests 2048
--max-batch-tokens 32768000
--max-batch-requests 128
--max-client-batch-size 4096
--auto-truncate
--tokenization-workers 64
--payload-limit 16000000
Expected behavior
It was still running normally before, until I encountered the context was too long, and then I couldn't successfully restart the model
Metadata
Metadata
Assignees
Labels
No labels