About the parameter setting in the process of training the model from scratch #1290

leiqing110 · 2022-06-02T05:55:42Z

leiqing110
Jun 2, 2022

Hi author, when I set lr to 0.1 in the process of training dm_nfnet_f3 from scratch on ImageNet dataset, the loss becomes particularly large and the model does not converge properly. How did you set the parameters during the training process?

rwightman · 2022-06-02T20:24:27Z

rwightman
Jun 2, 2022
Maintainer

@leiqing110 nfnets require gradient clipping, there are some comments in issues/discussions about that, timm has agc and the params mentioned in the paper will work so long as you scale your LR appropriately with the global batch size ...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

About the parameter setting in the process of training the model from scratch #1290

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

About the parameter setting in the process of training the model from scratch #1290

Uh oh!

leiqing110 Jun 2, 2022

Replies: 1 comment

Uh oh!

rwightman Jun 2, 2022 Maintainer

leiqing110
Jun 2, 2022

rwightman
Jun 2, 2022
Maintainer