Hello!
Are there any plans for Retro/Dupmae implementation for modernbert pre-training? I was able to change couple of argument to start training for Modernbert-base, however grad_norm and loss values are stuck at 0/nan, so it seems harder to implement.
Any advice appreciated.