-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Description
Hi team, I follow this guide: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client_guide/openai_readme.html#embedding-models to serve the "bge-large-zh-v1.5" model, link here:https://huggingface.co/BAAI/bge-large-zh-v1.5, and I try to send a single request it also works, but when I try to use aiperf to conduct the benchmark test, it faild, tips Input should be a valid string, so it seems like not support the list[str].
Triton Information
What version of Triton are you using?
nvcr.io/nvidia/tritonserver:26.01-vllm-python-py3
Are you using the Triton container or did you build it yourself?
nvcr.io/nvidia/tritonserver:26.01-vllm-python-py3
To Reproduce
Steps to reproduce the behavior.
I use the above image and create a deployment, command as below:
cd /opt/tritonserver/python/openai
python3 openai_frontend/main.py --model-repository xxx --openai-port 8000
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
model:BAAI/bge-large-zh-v1.5
config.pbtxt:
backend: "vllm"
instance_group [{kind: KIND_MODEL}]
model.json
{"model": "xxx","gpu_memory_utilization": 0.9}
Expected behavior
It should allow the list[str], its the openai specific.