|
if not verify_min_gpu_count(min_gpus=_min_gpu_count): |
The above line check the GPU count for the current process, which makes 2 node with 1 GPU each node fail to run, i.e. the slurm launcher script to fail:
https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/sbatch_run.sh#L19