PyTorch datasets don't support multiprocessing

PyTorch's [Dataloader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) has an argument for `num_workers` which can fetch items from your dataset in parallel using multiprocessing, but this requires your dataset to be able to be pickled so Python can distribute it across multiple processes. 

Currently, if you try to use multiple workers for a `muspy` dataset you get the following error:

```
AttributeError: Can't pickle local object 'Dataset.to_pytorch_dataset.<locals>.TorchRepresentationDataset'
```

There's more context on the `pickle` issue in [this Stackoverflow thread](https://stackoverflow.com/questions/56533827/pool-apply-async-nested-function-is-not-executed/56534386#56534386). 

Here's a minimal reproducible example to test it out yourself:
```
import muspy
import torch

haydn = muspy. HaydnOp20Dataset("data/", download_and_extract=True).convert()
dataset = haydn.to_pytorch_dataset(representation="pianoroll")
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=2)
batch = next(iter(dataloader))
```

I'm happy to open a PR with a fix for this, it mostly involves moving `TorchRepresentationDataset` and `TorchMusicFactoryDataset` to be defined outside of `to_pytorch_dataset`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PyTorch datasets don't support multiprocessing #74

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

PyTorch datasets don't support multiprocessing #74

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions