paddleocr-vl finetuning dataset format

Hi, I've been reading the instructions on how to finetune the paddleocr-VL model and I have some questions regarding how to prepare the finetuning dataset: https://github.com/PaddlePaddle/ERNIE/blob/release/v1.4/docs/paddleocr_vl_sft.md

<img width="748" height="930" alt="Image" src="https://github.com/user-attachments/assets/e72a54cd-722b-4770-8643-a6404e7aa650" />

1. Let's say I have a single page pdf image with some text, table, and images (see above). How should I generate the finetuning data in this case? Do I have to separate all text, tables, and images and create a finetuning dataset for each task?
2. is it possible to train the paddleocr-VL model from scratch using ERNIE?
3. Assume that my finetuning dataset only contains 1 task (say Table Recognition), how do you think this will impact the overall model performance?

Thank you so much!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

paddleocr-vl finetuning dataset format #1348

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

paddleocr-vl finetuning dataset format #1348

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions