How to predict with 1000s of dataloaders? #19388
Unanswered
HadiSDev
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a case where I have 2K + dataloaders that each hold a dataset with a dataframe.
The model is a pretrained Bert model, and I want to create a trainer that can take all these dataloaders and call predict on them.
Each output should be seperate, meaning I cannot mix between the different dataloaders. I need to save the predictions individually for each dataset.
The problem I am having is I get random OOM from cuda after 20th, 50th, 100th or 200th dataloader, so I am never able to actually finish the predict step. It seems like the predict function was never made to handle so many dataloaders. I tried making it work with 1 or 2 GPUs (L4) but still no success :(
The only solution I am left with is running it with 1 GPU with dataloaders in batches instead, save predictions, and send the next batch of dataloaders, but this does not really work with multi-gpu setup.
Any ideas what I am doing wrong?
Beta Was this translation helpful? Give feedback.
All reactions