Datatset too big #17785
Unanswered
rosscleung
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Datatset too big
#17785
Replies: 1 comment
-
You can try |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm having the same problem and (very surprisingly) cannot find the right answer on google nor chatgbt nor in discussions. There was one discussion but it died off without an answer.
From my understanding, this is how a dataset looks like:
The problem:
To instantiate this CustomDataSet(), the function
read_data()
runs and reads everything into memory, so that__len__()
and__getitem__()
works. However, what if the file is too big for memory? Dataloader doesn't address the problem. A Dataloader() (at least from my understanding) just sends a batch of indices from the CustomDataSet(), it still relies on the fact that a CustomDataSet() is already instantiated without OOM issues. If the file is too big, my code won't even get to the Dataloader() part.So, one split one big dataset into several smaller files, so the CustomDataSet() can run its own read_data() and load everything into memory, but that means I need to keep calling the
fit()
function on multiple datasets in a loop?Beta Was this translation helpful? Give feedback.
All reactions