About the parameter setting in the process of training the model from scratch #1290
Unanswered
leiqing110
asked this question in
Q&A
Replies: 1 comment
-
@leiqing110 nfnets require gradient clipping, there are some comments in issues/discussions about that, timm has agc and the params mentioned in the paper will work so long as you scale your LR appropriately with the global batch size ... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi author, when I set lr to 0.1 in the process of training dm_nfnet_f3 from scratch on ImageNet dataset, the loss becomes particularly large and the model does not converge properly. How did you set the parameters during the training process?
Beta Was this translation helpful? Give feedback.
All reactions