Skip to content

paddleocr-vl finetuning dataset format #1348

@Theophylline

Description

@Theophylline

Hi, I've been reading the instructions on how to finetune the paddleocr-VL model and I have some questions regarding how to prepare the finetuning dataset: https://github.com/PaddlePaddle/ERNIE/blob/release/v1.4/docs/paddleocr_vl_sft.md

Image
  1. Let's say I have a single page pdf image with some text, table, and images (see above). How should I generate the finetuning data in this case? Do I have to separate all text, tables, and images and create a finetuning dataset for each task?
  2. is it possible to train the paddleocr-VL model from scratch using ERNIE?
  3. Assume that my finetuning dataset only contains 1 task (say Table Recognition), how do you think this will impact the overall model performance?

Thank you so much!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions