Skip to content

Does ipex support LORA training with fluxgym? #833

Open
@carlosrojasg

Description

@carlosrojasg

Describe the issue

I'm struggling make fluxgym works and training a LORA, I'm using the default flux-dev model and I've been getting avr_loss=nan in the training
This is the last script that I tried due I tried to fix the issue testing with several combinations always getting the same:
accelerate launch ^
--num_cpu_threads_per_process 1 ^
sd-scripts/flux_train_network.py ^
--pretrained_model_name_or_path "c:\llm\fluxgym\models\unet\flux1-dev.sft" ^
--clip_l "c:\llm\fluxgym\models\clip\clip_l.safetensors" ^
--t5xxl "c:\llm\fluxgym\models\clip\t5xxl_fp16.safetensors" ^
--ae "c:\llm\fluxgym\models\vae\ae.sft" ^
--cache_latents_to_disk ^
--save_model_as safetensors ^
--sdpa --persistent_data_loader_workers ^
--max_data_loader_n_workers 2 ^
--seed 42 ^
--gradient_checkpointing ^
--mixed_precision no ^
--save_precision float ^
--network_module networks.lora_flux ^
--network_dim 2 ^
--optimizer_type adamw ^
--optimizer_args "betas=(0.9,0.999)" "eps=1e-8" "weight_decay=0.01" ^
--lr_scheduler cosine_with_restarts ^
--learning_rate 5e-5 ^
--cache_text_encoder_outputs ^
--cache_text_encoder_outputs_to_disk ^
--highvram ^
--max_train_epochs 16 ^
--save_every_n_epochs 4 ^
--dataset_config "c:\llm\fluxgym\outputs\test\dataset.toml" ^
--output_dir "c:\llm\fluxgym\outputs\test" ^
--output_name test ^
--timestep_sampling shift ^
--discrete_flow_shift 3.1582 ^
--model_prediction_type raw ^
--guidance_scale 1 ^
--loss_type l2 ^
--max_grad_norm 1.0 ^

Output Example:
[2025-05-27 20:19:19] [INFO] current_epoch: 0, epoch: 1
[2025-05-27 20:20:34] [INFO] steps: 0%| | 1/320 [00:37<3:21:00, 37.81s/it]
steps: 0%| | 1/320 [00:37<3:21:00, 37.81s/it, avr_loss=nan]
steps: 1%| | 2/320 [00:43<1:54:08, 21.54s/it, avr_loss=nan]
steps: 1%| | 2/320 [00:43<1:54:08, 21.54s/it, avr_loss=nan]
steps: 1%| | 3/320 [00:48<1:24:51, 16.06s/it, avr_loss=nan]
steps: 1%| | 3/320 [00:48<1:24:51, 16.06s/it, avr_loss=nan]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions