How to fine-tune a distilled model (DeiT)? #1057
Unanswered
manuel-rdz
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
I'm new to distilled models and I was trying to use DeiT model with 20 outputs
Usually what I do is just change the head of the transformer models with a linear layer
But I noticed that distilled models (at least DeiT) have 2 heads, 'head' and 'head_dist' and return a tuple of 2 sets of logits while doing the prediction (I assume they come from the 2 heads)
So my question is:
What is the correct way to replace the heads to fine-tune it? (replace both, only one, etc)
How to use the 2 returned set of logits to calculate the loss? (use only one set for the loss, a combination of both, etc)
Thank you :)
Beta Was this translation helpful? Give feedback.
All reactions