Train multiple models in a single GPU on parallel #10159

grudloff · 2021-10-26T20:01:16Z

grudloff
Oct 26, 2021

Is there a recommended way of training multiple models in parallel in a single GPU? I tried using joblib's Parallel & delayed but I got a CUDA OOM with two instances even though a single model uses barely a fourth of the total memory. And is a speedup compared to sequential calling expected?

Programmer-RD-AI · 2021-10-28T03:31:48Z

Programmer-RD-AI
Oct 28, 2021

Solution https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html

0 replies

Programmer-RD-AI · 2021-10-28T03:32:10Z

Programmer-RD-AI
Oct 28, 2021

Solution #2807

0 replies

Programmer-RD-AI · 2021-10-28T03:32:44Z

Programmer-RD-AI
Oct 28, 2021

Solution https://wandb.ai/wandb/wandb-lightning/reports/Multi-GPU-Training-Using-PyTorch-Lightning--VmlldzozMTk3NTk

0 replies

grudloff · 2021-10-28T03:49:23Z

grudloff
Oct 28, 2021
Author

Thanks, @Programmer-RD-AI. Sadly your solutions point to multi-gpu training, but I am looking into training multiple models on a single GPU (and not the other way around!). And the issue you reference is just training a lot of models sequentially (if I understood correctly)

1 reply

Programmer-RD-AI Oct 28, 2021

oh, ok

awaelchli · 2021-10-29T03:19:54Z

awaelchli
Oct 29, 2021

@grudloff I don't know how this could be done or if this is even possible, but if you don't find the answer here perhaps you could also ask in the PyTorch forum.

0 replies

cgerum · 2021-11-05T12:20:34Z

cgerum
Nov 5, 2021

It is possible to run multiple trainings on a single GPU using joblib. But as far as I know, you need to instantiate your Trainer and your LightningModule wether you will see speedups, highly depends on your model. If a single model already has a good GPU utilization, you will not see any speedups.

0 replies

rustamzh · 2023-08-29T13:15:41Z

rustamzh
Aug 29, 2023

Hi, I am doing testing in parallel on a single GPU using joblib. I am instantiating all modules inside the delayed function. However, I had a problem with exceptions in pytorch (relevant issue), which was solved by specifying multiprocessing_context in DataLoader (relevant issue comment)

3 replies

rustamzh Aug 29, 2023

Regarding the speed up, yes definitely in testing

grudloff Sep 12, 2023
Author

could you provide a code snippet?

rustamzh Sep 20, 2023

Here are relevant snippets:

        with parallel_backend('loky',n_jobs=n_jobs):
            metrics = Parallel()(delayed(model_testing)(**test_param) for test_param in get_test_params(...))

def model_testing(...):

    datamodule = None
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        datamodule = get_datamodule_test(...)

    if not datamodule:
        return None

    trainer = pl.Trainer(
        enable_model_summary=False,
        enable_progress_bar=False,
        logger=False,
        log_every_n_steps=10,
        check_val_every_n_epoch=1,
        enable_checkpointing=True,
        accelerator="auto",
        devices=1,
    )

    rows = dict()
    
    test_metrics = trainer.test(model, datamodule=datamodule, verbose=False)
    rows[(...)] = test_metrics[0]
    
    return rows

Train multiple models in a single GPU on parallel #10159

Uh oh!

Replies: 7 comments · 4 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grudloff Oct 28, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grudloff Sep 12, 2023 Author

Uh oh!

Replies: 7 comments 4 replies

grudloff
Oct 28, 2021
Author

grudloff Sep 12, 2023
Author