Skip to content

Feature Request: Support multimodal models in correctness test #88

@liu-shaojun

Description

@liu-shaojun

Hi team, thanks for the great work on llmperf — it's been super helpful for benchmarking LLM APIs.

I’m wondering if there are any plans to extend the correctness test framework to support multimodal models, e.g., models that accept both text and image inputs (like Qwen2-VL-7B-Instruct, glm-4v-9b, etc.).

This would be especially useful for evaluating models on tasks like OCR, image-to-text, or visual question answering.

Would love to hear your thoughts!

Thanks 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions