Kornia ViT encoder problem in decoding phase #445

carlodenardin · 2023-05-11T14:31:53Z

carlodenardin
May 11, 2023

Hi, I am currently working on a neural network for anomaly detection. I want to build an autoencoder and for the encode phase I'm using the Vision Transformer provided by kornia. The problem is that I'm not getting how the output of the Vision Transformer can be decoded since the ViT provides, in my case, this output [1, 257, 64] with 16 for the patch and 64 for the embedded and image size 256x256 with 3 channel. How can I pass this output to my decoder in a proper way? I was thinking of using a simple reset for my decoder but I'm struggling on understanding this step (basically I need to reconstruct the image). If you have any reference or something that will be useful I ll appreciate! Thanks in advance

mrdbourke · 2023-05-15T10:20:05Z

mrdbourke
May 15, 2023
Maintainer

Hi @carlodenardin ,

Your decoder will have to be able to work with the output shape of the ViT.

I'd perhaps look in the Kornia documentation (I'm not 100% familiar with it) for what the output shape of their ViT implementation is.

Once you know what the dimensions are (e.g. [batch_size, patches, embedding_dim], looks like your embedding dim is 64), you can feed this into a decoder that's compatible with the input shape.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kornia ViT encoder problem in decoding phase #445

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Kornia ViT encoder problem in decoding phase #445

Uh oh!

carlodenardin May 11, 2023

Replies: 1 comment

Uh oh!

mrdbourke May 15, 2023 Maintainer

carlodenardin
May 11, 2023

mrdbourke
May 15, 2023
Maintainer