-
Notifications
You must be signed in to change notification settings - Fork 88
Closed
Labels
Description
Thanks for this fantastic work.
The PerceptionLM paper mentions that synthetic data engine (in Appendix K) is designed entirely to use open-source models (instead of using proprietary models) and scale up synthetic data generation to around 64.7M samples of images and videos. This is can be considered as using distillation from large-sized open-source models (in this case Llama 3V) to label training data. However, I could not see Llama 3V included as a baseline in the evaluation.
Can the team please share the performance gain of PerceptionLM over Llama 3V ? If there is gain, any observations/findings to justify this gain ?