Skip to content

Implement MiniCPM-V-2 #637

Open
Open
@abrichr

Description

@abrichr

Feature request

https://github.com/OpenBMB/MiniCPM is the smallest multimodal model available. The latest version, https://huggingface.co/openbmb/MiniCPM-V-2, appears to be able to understand GUI images:

Describe the given image with as much detail as possible.

image

The image features a calculator screen displaying the number 45. The interface includes various mathematical symbols and operators, such as plus (+), minus (-) signs with corresponding buttons for each operation: addition ("="), subtraction ('÷'), multiplication (*", division("/"),"and "%) percentage sign(s)"%"...
There are also indicators of different types in green colors like 'O', indicating an operator or function button on one side while there're numbers from zero to nine represented by numerical keys elsewhere along this section within its boundary which suggests that it is designed primarily for performing arithmetic calculations rather than any complex math operations beyond simple algebra rules due to absence visible functions related directly connected through keyboard shortcuts via icons similar those seen at top left corner

This model has been integrated into https://github.com/vllm-project/vllm, which is likely how we want to implement it here.

This involves creating an openadapt.adapters.vllm with parameterizable model name.

Motivation

Fully offline support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions