-
Hey! I'm trying to understand how to minimize the GPU memory during training and tried to use the set_grad_checkpointing with the ViT model, it seems like the checkpointing is done only on self.blocks, but not on the self.patch_embed, why is that? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@DaniNem only recent version of pytorch (I think 1.11+) allowed safely checkpointing the first block (once |
Beta Was this translation helpful? Give feedback.
@DaniNem only recent version of pytorch (I think 1.11+) allowed safely checkpointing the first block (once
use_reentrant
flag was added, and can be set to False), however, the additional gains are minimal so I opted to keep it simple and just checkpiont the blocks for all models where it made sense to do so