Custom checkpoint : torchscript and ONNX export #17343
Unanswered
pfeatherstone
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment 2 replies
-
So far I have this:
But when I train on multi-GPU i get a deadlock the first time it calls this checkpoint, it hangs for ever spinning both my GPUs at 100% doing nothing. And that's despite calling |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I would like a custom checkpoint callback which:
So far i have:
How do i convert
pl_module
to normal pytorchtorch.nn.Module
?Huggingface Accelerate library has
accelerator.unwrap_model()
. Does lightning have something similar?How do I synchronise all the nodes. Huggingface Accelerate has
accelerator.wait_for_everyone()
, which presumably I would need since checkpointing only happens on one node.Beta Was this translation helpful? Give feedback.
All reactions