Skip to content

FR: allow multimodal input / vision / images #429

Closed
@thiswillbeyourgithub

Description

@thiswillbeyourgithub

It would be simple to make it so that in the prompt text paths/urls to images are replaced by image call.

I could then for example add a shortcut so that images that are in my clipboard could be pasted to /tmp and add a path automatically.

See the kind of workflow implemented in ollama:

What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.

Somewhat related to:

Edit:
Oh I see that there's already partial support there: #332

It should be :

  • enabled for the other gpt4 models that support it
  • mentionned in the docs
  • support local files
    I'll see about making a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions