Skip to content

EMA accuracy problems #1092

Answered by rwightman
Doraemonzm asked this question in Q&A
Jan 18, 2022 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

The time period of the EMA weight average is optimizer steps, so it needs to be set relative to your steps per epoch, you have a large global batch size (4096) so very few steps per epoch and need to change your decay factor to make sense (have equivalence to maybe 30-100 epochs, I usually target 10-25% of training duration). Right now your EMA weights probably won't be 'good' until a few hundred epochs have passed... you can look up details on EMA periods, etc

Other aside, it's unlikely a 4096 global batch with RMSProp will be 'great', best results for that optimizer for me have been in the 768-256 range. Maybe tweaked version of LAMB hparams from ResNet strikes back could be more compet…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Doraemonzm
Comment options

Answer selected by Doraemonzm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants