GitHub - dtdo90/Llama3.2_11B_VLM_base: Fine-tune Llama 3.2 11B Vision model to write Amazon product description

Fine-tuning LLAMA 3.2-11B VLM for Amazon Product Descriptions

This repository demonstrates the process of fine-tuning LLAMA 3.2-11B VLM on the Amazon product description dataset philschmid/amazon-product-descriptions-vlm. The goal is to enhance the model's capability in understanding visual features and textual information to produce coherent descriptions.

Training Details

Training Framework: The training uses the SFTTrainer from the trl (Transformer Reinforcement Learning) library.
Parameter Optimization: QLoRA (Low-Rank Adaptation) is applied to reduce the number of parameters and improve efficiency during the fine-tuning process.
Training duration: It takes around 2 hours to train 3 epochs on A6000 (48GB) GPU.

Example

Below is an example of the description generated by the trained model versus the description generated by the raw model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
llama-3.2-vlm		llama-3.2-vlm
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
compare.png		compare.png
llama32_11b_vlm.ipynb		llama32_11b_vlm.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-tuning LLAMA 3.2-11B VLM for Amazon Product Descriptions

Training Details

Example

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dtdo90/Llama3.2_11B_VLM_base

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning LLAMA 3.2-11B VLM for Amazon Product Descriptions

Training Details

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages