Skip to content

Training Details #28

@zc-zhao

Description

@zc-zhao

Thanks for your great work! I misunderstand the training data in the main paper.

In first paragraph, you use the same data as LLaVA, so does this mean that this model (using Vicuna as LLM) only features the image understanding not includes video?

In the second paragraph, you use the video data. Is it used when using LLaMA-3.1-8B or also when using Vicuna-7B?Does this mean that all the experiments on video evaluation in your paper were conducted based on LLaMA-3.1-8B?

Could you release more training details and code?

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions