-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Thanks for the greate work. When I was training the LLM model with the run.sh, I meet the following problem. The LLM model was training with FP16,
/data/InspireMusic/inspiremusic/utils/train_utils.py", line 252, in update_parameter_and_lr
scaler.unscale_(optimizer) # Unscale gradients before clipping
File "/data/InspireMusic/anaconda3/envs/inspiremusic/lib/python3.8/site-packages/torch/amp/grad_scaler.py", line 337, in unscale_
optimizer_state["found_inf_per_device"] = self.unscale_grads(
File "/data/InspireMusic/anaconda3/envs/inspiremusic/lib/python3.8/site-packages/torch/amp/grad_scaler.py", line 259, in unscale_grads
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
Metadata
Metadata
Assignees
Labels
No labels