Skip to content

triton+vllm serve embeddings but not support list[str] #8655

@carloscao0928

Description

@carloscao0928

Description
Hi team, I follow this guide: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client_guide/openai_readme.html#embedding-models to serve the "bge-large-zh-v1.5" model, link here:https://huggingface.co/BAAI/bge-large-zh-v1.5, and I try to send a single request it also works, but when I try to use aiperf to conduct the benchmark test, it faild, tips Input should be a valid string, so it seems like not support the list[str].

Triton Information
What version of Triton are you using?
nvcr.io/nvidia/tritonserver:26.01-vllm-python-py3

Are you using the Triton container or did you build it yourself?
nvcr.io/nvidia/tritonserver:26.01-vllm-python-py3

To Reproduce
Steps to reproduce the behavior.
I use the above image and create a deployment, command as below:
cd /opt/tritonserver/python/openai
python3 openai_frontend/main.py --model-repository xxx --openai-port 8000

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

model:BAAI/bge-large-zh-v1.5

config.pbtxt:
backend: "vllm"
instance_group [{kind: KIND_MODEL}]

model.json
{"model": "xxx","gpu_memory_utilization": 0.9}

Expected behavior
It should allow the list[str], its the openai specific.

Metadata

Metadata

Assignees

Labels

EnhancementNew feature or requestopenaiOpenAI related

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions