What is performance gain over Llama 3V ?

Thanks for this fantastic work. 

The PerceptionLM paper mentions that synthetic data engine (in Appendix K) is designed entirely to use open-source models (instead of using proprietary models) and scale up synthetic data generation to around 64.7M samples of images and videos. This is can be considered as using distillation from large-sized open-source models (in this case Llama 3V) to label training data. However, I could not see Llama 3V included as a baseline in the evaluation.

Can the team please share the performance gain of PerceptionLM over Llama 3V ? If there is gain, any observations/findings to justify this gain ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is performance gain over Llama 3V ? #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is performance gain over Llama 3V ? #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions