How to configure custom AMP operations in training_step function with manual optimization setting? #17821
Closed
Unanswered
JimLiuAtSJTU
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
pytorch-lightning is a very useful fabric to implement ideas.
however, in my experiments (fp16 AMP enabled), there are some chances that the paremeters of the network goes to inf.
So I want to clamp the parameters in my module, in the 16bit-AMP setting. I belive this trick may be also used by other people encountering with numerical issue.
but I could not find related explanation or examples.
I tried to implement a custom optimization loop, using pytorch amp semantics(with gradient scaler and lr scheduler), but i found it hard and confusing, especially when tangled with lightning api. Now I can only run the trainer in 32bit float, which leads to a drastic performance downgrade.
due to some performance and quelity metrics, i am not considering modifying the model architecture in the first place. So I should try to figure out how to enable amp via manual optimization loop.
could anyone help me? Are there some references I may ignored?
my references are
[https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html]
https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html?highlight=manual_backward#lightning.pytorch.core.LightningModule.manual_backward
Thank you very much! 👯
Beta Was this translation helpful? Give feedback.
All reactions