Skip to content

Commit 5d5c066

Browse files
authored
mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326)
Mistral Small 2506 models using Pixtral vision encoder were running out of GPU memory when processing images larger than 1024x1024 pixels due to exponential memory growth from unlimited image size. This fix applies the same 1024x1024 limit used by Qwen2VL models to prevent OOM issues while maintaining compatibility with existing models.
1 parent 40bfa04 commit 5d5c066

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

tools/mtmd/clip.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2211,6 +2211,9 @@ struct clip_model_loader {
22112211
{
22122212
hparams.rope_theta = 10000.0f;
22132213
hparams.warmup_image_size = hparams.patch_size * 8;
2214+
// Mistral Small 2506 needs 1024x1024 image size cap to prevent OOM
2215+
// ref: https://github.com/ggml-org/llama.cpp/issues/14310
2216+
hparams.image_size = 1024;
22142217
get_u32(KEY_SPATIAL_MERGE_SIZE, hparams.spatial_merge_size, false);
22152218
} break;
22162219
case PROJECTOR_TYPE_GEMMA3:

0 commit comments

Comments
 (0)