FR: allow multimodal input / vision / images

It would be simple to make it so that in the prompt text paths/urls to images are replaced by [image call](https://platform.openai.com/docs/guides/vision/introduction).

I could then for example add a shortcut so that images that are in my clipboard could be pasted to /tmp and add a path automatically.

See the kind of workflow [implemented in ollama](https://github.com/ollama/ollama):
> ```
> What's in this image? /Users/jmorgan/Desktop/smile.png
> The image features a yellow smiley face, which is likely the central focus of the picture.
> ```


Somewhat related to:
* https://github.com/jackMort/ChatGPT.nvim/issues/386

Edit:
Oh I see that there's already partial support there: https://github.com/jackMort/ChatGPT.nvim/pull/332

It should be : 
* enabled for the other gpt4 models that support it
* mentionned in the docs
* support local files
I'll see about making a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FR: allow multimodal input / vision / images #429

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

FR: allow multimodal input / vision / images #429

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions