How to configure custom AMP operations in training_step function with manual optimization setting? #17821

JimLiuAtSJTU · 2023-06-13T14:25:24Z

JimLiuAtSJTU
Jun 13, 2023

pytorch-lightning is a very useful fabric to implement ideas.
however, in my experiments (fp16 AMP enabled), there are some chances that the paremeters of the network goes to inf.
So I want to clamp the parameters in my module, in the 16bit-AMP setting. I belive this trick may be also used by other people encountering with numerical issue.
but I could not find related explanation or examples.

I tried to implement a custom optimization loop, using pytorch amp semantics(with gradient scaler and lr scheduler), but i found it hard and confusing, especially when tangled with lightning api. Now I can only run the trainer in 32bit float, which leads to a drastic performance downgrade.

due to some performance and quelity metrics, i am not considering modifying the model architecture in the first place. So I should try to figure out how to enable amp via manual optimization loop.

could anyone help me? Are there some references I may ignored?

my references are
[https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html]
https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html?highlight=manual_backward#lightning.pytorch.core.LightningModule.manual_backward

Thank you very much! 👯

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to configure custom AMP operations in training_step function with manual optimization setting? #17821

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to configure custom AMP operations in training_step function with manual optimization setting? #17821

Uh oh!

JimLiuAtSJTU Jun 13, 2023

Replies: 0 comments

JimLiuAtSJTU
Jun 13, 2023