You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team, thanks for the great work on llmperf — it's been super helpful for benchmarking LLM APIs.
I’m wondering if there are any plans to extend the correctness test framework to support multimodal models, e.g., models that accept both text and image inputs (like Qwen2-VL-7B-Instruct, glm-4v-9b, etc.).
This would be especially useful for evaluating models on tasks like OCR, image-to-text, or visual question answering.