Hello, first of all, thank you for your work and making it open source.
Looking at the linear_probe.py file, I see that you have implemented an hyper parameters sweep approach to find the best weight-decay value when doing linear probing. I was wondering if you have also used this algorithm for image-classification tasks. If so, could you share which weight-decay value resulted in the best accuracy?