It seems PyTorch offers two (maybe three? including torchx) different methods to handle multi-gpu training, i.e., Spawning, and ElasticTorch(newer). The goal is to compare these methods in speed.
PyTorch version: > 1.10
torchelastic
- Spawning
python main_spawn.py --dist-url 'tcp://localhost:23456' --multiprocessing-distributed --world-size 1 --rank 0
- ElasticTorch
torchrun --standalone --nnodes=1 --nproc_per_node=$NUM_GPU main_launch.py
On CIFAR-10, TITAN V * 8:
Time per epoch | Spawning | ElasticTorch |
---|---|---|
Training (sec) | 8.52 | 5.73 |
Evaluation (sec) | 2.62 | 1.64 |
=> In my setting, ElasticTorch is faster than Spawning in about 50%