You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This folder contains the implementation of the InternVL for stage2 pre-training and retrieval fine-tuning.
3
+
This folder contains the implementation of the InternVL 1.0 for stage2 pre-training and retrieval fine-tuning, which corresponds to Section 4.3 of our [InternVL 1.0 paper](https://arxiv.org/pdf/2312.14238).
<td colspan="3" align=center><b>image-to-text</b></td>
295
279
<td colspan="3" align=center><b>text-to-image</b></td>
296
280
</tr>
@@ -367,3 +351,147 @@ Expected results:
367
351
```
368
352
369
353
</details>
354
+
355
+
## 🔥 Retrieval Fine-tuning (Fully)
356
+
357
+
> Note: In our experiments, full parameter fine-tuning achieves the best results on image-text retrieval tasks in Flickr30K and COCO. By following the experimental hyperparameters in this section, you can reproduce the model performance reported in the [Evaluation section](#evaluation).
358
+
359
+
To fine-tune InternVL on Flickr30K with 32 GPUs and slurm system, run:
360
+
361
+
```bash
362
+
PARTITION='your partition' GPUS=32 sh shell/finetune/internvl_stage2_finetune_flickr_364_bs1024_ep10.sh
363
+
```
364
+
365
+
To fine-tune InternVL on Flickr30K-CN with 32 GPUs and slurm system, run:
366
+
367
+
```shell
368
+
PARTITION='your partition' GPUS=32 sh shell/finetune/internvl_stage2_finetune_flickrcn_364_bs1024_ep10.sh
369
+
```
370
+
371
+
To fine-tune InternVL on COCO with 32 GPUs and slurm system, run:
372
+
373
+
```shell
374
+
PARTITION='your partition' GPUS=32 sh shell/finetune/internvl_stage2_finetune_coco_364_bs1024_ep5.sh
| GPUs for training | 32×A100 (80G) | 32×A100 (80G) | 32×A100 (80G) |
393
+
| Required GPU memory | 80G | 80G | 80G |
394
+
395
+
## 🔥 Retrieval Fine-tuning (Head)
396
+
397
+
> Note: This section demonstrates how to perform a cost-effective fine-tuning of our model. The hyperparameters shown here are not optimized for any specific task. For practical applications, further adjustments to the hyperparameters may be necessary to achieve optimal performance.
398
+
399
+
To fine-tune the head of InternVL on Flickr30K with 4 GPUs, run:
400
+
401
+
```bash
402
+
GPUS=4 BATCH_SIZE=32 sh shell/head_finetune/internvl_stage2_finetune_flickr_224_bs1024_ep10_head_4gpu.sh
403
+
```
404
+
405
+
To fine-tune the head of InternVL on Flickr30K-CN with 4 GPUs, run:
406
+
407
+
```shell
408
+
GPUS=4 BATCH_SIZE=32 sh shell/head_finetune/internvl_stage2_finetune_flickrcn_224_bs1024_ep10_head_4gpu.sh
409
+
```
410
+
411
+
To fine-tune the head of InternVL on COCO with 4 GPUs, run:
| GPUs for training | 4×GPU (>=32G) | 4×GPU (>=32G) | 4×GPU (>=32G) |
432
+
| Required GPU memory | 24G | 24G | 24G |
433
+
434
+
## 🔥 Retrieval Fine-tuning (LoRA)
435
+
436
+
> Note: This section demonstrates how to perform a cost-effective fine-tuning of our model. The hyperparameters shown here are not optimized for any specific task. For practical applications, further adjustments to the hyperparameters may be necessary to achieve optimal performance.
437
+
438
+
To fine-tune InternVL using LoRA on Flickr30K with 4 GPUs, run:
439
+
440
+
```bash
441
+
GPUS=4 BATCH_SIZE=32 sh shell/lora_finetune/internvl_stage2_finetune_flickr_224_bs1024_ep10_lora16_4gpu.sh
442
+
```
443
+
444
+
To fine-tune InternVL using LoRA on Flickr30K-CN with 4 GPUs, run:
445
+
446
+
```shell
447
+
GPUS=4 BATCH_SIZE=32 sh shell/lora_finetune/internvl_stage2_finetune_flickrcn_224_bs1024_ep10_lora16_4gpu.sh
448
+
```
449
+
450
+
To fine-tune InternVL using LoRA on COCO with 4 GPUs, run:
| GPUs for training | 4×GPU (>=40G) | 4×GPU (>=40G) | 4×GPU (>=40G) |
472
+
| Required GPU memory | 37G | 37G | 37G |
473
+
474
+
## Fine-Tuning a Custom Dataset
475
+
476
+
1.**Organize Your Data**: Format your dataset similar to COCO or Flickr30K.
477
+
478
+
2.**Update Meta Information**: Add your dataset's meta information to the `ds_collections` dictionary in `internvl_g/internvl/train/internvl_stage2_finetune.py`. For example:
- Include `flickr_format` or `coco_format` in your dataset's `dataset_name`. This will allow the script to reuse the Flickr30K or COCO dataloader accordingly.
496
+
497
+
By following these steps, you can easily fine-tune the InternVL model on your custom dataset using the existing COCO or Flickr30K data loading mechanisms.
0 commit comments