Kornia ViT encoder problem in decoding phase #445
Unanswered
carlodenardin
asked this question in
Q&A
Replies: 1 comment
-
Hi @carlodenardin , Your decoder will have to be able to work with the output shape of the ViT. I'd perhaps look in the Kornia documentation (I'm not 100% familiar with it) for what the output shape of their ViT implementation is. Once you know what the dimensions are (e.g. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am currently working on a neural network for anomaly detection. I want to build an autoencoder and for the encode phase I'm using the Vision Transformer provided by kornia. The problem is that I'm not getting how the output of the Vision Transformer can be decoded since the ViT provides, in my case, this output [1, 257, 64] with 16 for the patch and 64 for the embedded and image size 256x256 with 3 channel. How can I pass this output to my decoder in a proper way? I was thinking of using a simple reset for my decoder but I'm struggling on understanding this step (basically I need to reconstruct the image). If you have any reference or something that will be useful I ll appreciate! Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions