How to combine multiple lightning module and save hyperparameters #7249

kyleong · 2021-04-28T07:37:18Z

kyleong
Apr 28, 2021

Good day, I'm currently working on two models which train on the same data. I'd like to integrate the two pre-trained models into one and use it for transfer learning. The combination is written as such (you can copy paste to run it).

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset

class MyModelA(pl.LightningModule):
    def __init__(self, hidden_dim = 10):
        super(MyModelA, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)
    
class MyModelB(pl.LightningModule):
    def __init__(self, hidden_dim = 10):
        super(MyModelB, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
        self.save_hyperparameters()
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)

class MyEnsemble(pl.LightningModule):
    def __init__(self, modelA, modelB):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        self.modelA.freeze()
        self.modelB.freeze()
        self.classifier = torch.nn.Linear(4, 2)

        #self.save_hyperparameters() # Uncomment to show error

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        x = torch.cat((x1, x2), dim=1)
        x = self.classifier(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)

dl = DataLoader(TensorDataset(torch.randn(1000, 10), 
                              torch.randn(1000, 2)), 
                batch_size = 10)

modelA = MyModelA()
modelB = MyModelB()

# pretrained modelA and modelB
trainerA = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerA.fit(modelA, dl)
trainerB = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerB.fit(modelB, dl)

# modelA and modelB contains pretrained weights
model = MyEnsemble(modelA, modelB)

trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(model, dl)

At first the code worked fine. However, I would like to save the hyperparameters of the ensemble module, but adding self.save_hyperparameters() at the end of the ensemble module __init__ return this error.

ValueError: dictionary update sequence element #0 has length 1; 2 is required

Hence my question is how can I combine two or more lightning modules in a single module and save its hyperparameters? Or is there any alternative way to do so?

Thanks in advance!

EDIT: Code updated to show both modelA and modelB are pretrained.

Answered by kyleong

May 9, 2021

I have finally came out with the final solution which can be obtained here.

Thank you for anyone who read and participate in this discussion.

View full answer

awaelchli · 2021-04-28T11:11:55Z

awaelchli
Apr 28, 2021

Hi
It turns out you don't actually have any hyperparameters. The only inputs you have are entire models, these can't be saved like this (yes we could try to improve the error message).

Allow me to modify your example a bit :))

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset


class MyModelA(pl.LightningModule):
    def __init__(self, hidden_size_a=10):
        super(MyModelA, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_size_a, 2)

    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)


class MyModelB(pl.LightningModule):
    def __init__(self, hidden_size_b=10):
        super(MyModelB, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_size_b, 2)

    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)


class MyEnsemble(pl.LightningModule):
    def __init__(self, hidden_size_a, hidden_size_b):
        super(MyEnsemble, self).__init__()
        self.save_hyperparameters()
        self.modelA = MyModelA(hidden_size_a)
        self.modelB = MyModelB(hidden_size_b)
        self.modelA.freeze()
        self.modelB.freeze()
        self.classifier = torch.nn.Linear(4, 2)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        x = torch.cat((x1, x2), dim=1)
        x = self.classifier(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)

model = MyEnsemble(hidden_size_a=10, hidden_size_b=10)

dl = DataLoader(TensorDataset(torch.randn(1000, 10),
                              torch.randn(1000, 2)),
                batch_size=10)

print("my hyperparameters are:")
print(model.hparams)

trainer = pl.Trainer(gpus=0, max_epochs=5, progress_bar_refresh_rate=50)
trainer.fit(model, dl)

Is this more in the direction you were thinking? Note the changes are in ModelA and ModelB init and mainly instantiating the modules inside the Ensamble module.

2 replies

kyleong Apr 28, 2021
Author

Hi @awaelchli :-)

Thank you for your reply. Sorry for not making it clear, but the reason I'm passing the entire models (modelA and modelB) as arguments is because they both contain pretrained weights. So which means both modelA and modelB are pretrained before passing into ensemble model.

For example like:

modelA = MyModelA()
modelB = MyModelB()

# pretrain modelA and modelB
trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(modelA, dl)
trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(modelB, dl)

# modelA and modelB now contains pretrained weights
model = MyEnsemble(modelA, modelB)

trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(model, dl)

PS: I have updated the question.

awaelchli Apr 30, 2021

okay, I understand :)

kyleong · 2021-04-29T03:28:57Z

kyleong
Apr 29, 2021
Author

Good day.

After several trial and error, I come up with three ways on how to save hyperparameters for combined lightning module. Since this will be a long post, I will reply the solutions with codes and their respective cons under this comment thread.

TLDR:

Pass both hyperparameters and parameters/weights of the pretrained models to the Ensemble module and instantiating the pretrained modules inside the Ensemble module. This one is sort of similar to @awaelchli 's answer with additional tweaks.
Pass the path of the pretrained models to the Ensemble module and instantiating the pretrained modules inside the Ensemblemodule.
Construct the pretrained models using torch.nn.Module and pretrain them in LightningModule. Then, pass the pretrained models to the Ensemble module in torch.nn.Module form.

Let me know your thoughts. Also, I hope that this discussion can be continued as I'm still not sure whether these solutions are correct or best in practice.

Any other solutions are still welcomed.

Thanks in advance.

3 replies

kyleong Apr 29, 2021
Author

Solution 1:

Pass both hyperparameters and parameters/weights of the pretrained models to the Ensemble module and instantiating the pretrained modules inside the Ensemble module. This one is sort of similar to @awaelchli 's answer with additional tweaks.

Code (you can copy paste to run it):

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset

class MyModelA(pl.LightningModule):
    def __init__(self, hidden_dim = 10):
        super(MyModelA, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)
    
class MyModelB(pl.LightningModule):
    def __init__(self, hidden_dim = 10):
        super(MyModelB, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
        self.save_hyperparameters()
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
      
    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)

class MyEnsemble(pl.LightningModule):
    def __init__(self, 
                modelA_hparams, modelB_hparams, 
                modelA_params = None, modelB_params = None):
        super(MyEnsemble, self).__init__()
        self.modelA = MyModelA(**modelA_hparams)
        self.modelB = MyModelB(**modelA_hparams)

        if modelA_params:
            self.modelA.load_state_dict({k: v["value"].reshape(v["shape"])
                                        for k, v in modelA_params.items()})
        if modelB_params:
            self.modelB.load_state_dict({k: v["value"].reshape(v["shape"])
                                        for k, v in modelB_params.items()})

        self.modelA.freeze()
        self.modelB.freeze()
        self.classifier = torch.nn.Linear(4, 2)

        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        x = torch.cat((x1, x2), dim=1)
        x = self.classifier(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)

dl = DataLoader(TensorDataset(torch.randn(1000, 10), 
                            torch.randn(1000, 2)), 
                batch_size = 10)

modelA = MyModelA(10)
modelB = MyModelB(10)

# pretrained modelA and modelB
trainerA = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerA.fit(modelA, dl)
trainerB = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerB.fit(modelB, dl)

# Reshape parameters/weights such that it is 1D
modelA_params = {k: {"shape": v.shape,"value": torch.flatten(v)} 
                for k, v in modelA.state_dict().items()}
modelB_params = {k: {"shape": v.shape,"value": torch.flatten(v)} 
                for k, v in modelB.state_dict().items()}
modelA_hparams = modelA.hparams
modelB_hparams = modelB.hparams

# modelA and modelB contain pretrained weights
model = MyEnsemble(modelA_hparams, modelB_hparams, 
                modelA_params, modelB_params)

trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(model, dl)

The cons of this solution:

Parameters/weights (obtained from state_dict()) are required to be 1d before passed to the Ensemble module, and reshape back to 2d during weights loading. Gotten from here.

So doing something like this

class MyEnsemble(pl.LightningModule):
    def __init__(self, 
                modelA_hparams, modelB_hparams, 
                modelA_params = None, modelB_params = None):
        super(MyEnsemble, self).__init__()
        self.save_hyperparameters()
        self.modelA = MyModelA(**modelA_hparams)
        self.modelB = MyModelB(**modelA_hparams)

        if modelA_params:
            self.modelA.load_state_dict(modelA_params)
        if modelB_params:
            self.modelB.load_state_dict(modelB_params)
        ...  

model = MyEnsemble(modelA.hparams, modelB.hparams, modelA.state_dict(), modelB.state_dict())

witth return this error.

File "/home/kyleong/projects/test/env/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py", line 192, in hparams ssi.hparams[k].number_value = v
TypeError: array([ 0.00191227,  0.03985047, -0.00698349,  0.02745114,  0.04182707,  -0.00866331, -0.02626 has type numpy.ndarray, but expected one of: int, long, float

kyleong Apr 29, 2021
Author

Solution 2:

Pass the path of the pretrained models to the Ensemble module and instantiating the pretrained modules inside the Ensemblemodule.

Code (you can copy paste to run it):

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset

class MyModelA(pl.LightningModule):
    def __init__(self, hidden_dim = 10):
        super(MyModelA, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)
    
class MyModelB(pl.LightningModule):
    def __init__(self, hidden_dim = 10):
        super(MyModelB, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
        self.save_hyperparameters()
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
      
    def forward(self, x):
        x = self.fc1(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)

class MyEnsemble(pl.LightningModule):
    def __init__(self, modelA_hparams = None, modelB_hparams = None,
                 modelA_path = None, modelB_path = None):
        super(MyEnsemble, self).__init__()
        # If path exists, then load from path else instantiate model using respective hparams
        if modelA_path:
            self.modelA = MyModelA.load_from_checkpoint(modelA_path)
        else:
            self.modelA = MyModelA(**modelA_hparams)

        if modelB_path:
            self.modelB = MyModelB.load_from_checkpoint(modelA_path)
        else:
            self.modelB = MyModelB(**modelA_hparams)

        self.modelA.freeze()
        self.modelB.freeze()
        self.classifier = torch.nn.Linear(4, 2)

        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        x = torch.cat((x1, x2), dim=1)
        x = self.classifier(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)

dl = DataLoader(TensorDataset(torch.randn(1000, 10), 
                            torch.randn(1000, 2)), 
                batch_size = 10)

modelA = MyModelA(10)
modelB = MyModelB(10)

# pretrained and save modelA and modelB
trainerA = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerA.fit(modelA, dl)
trainerA.save_checkpoint("modelA.ckpt")
trainerB = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerB.fit(modelB, dl)
trainerB.save_checkpoint("modelB.ckpt")

# modelA and modelB contain pretrained weights
model = MyEnsemble(modelA_path = "modelA.ckpt", modelB_path = "modelB.ckpt")

trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(model, dl)

The cons of this solution:

Deleting or moving the pretrained checkpoints (files) from the initial directory will result in Ensemble model failure — FileNotFoundError.

kyleong Apr 29, 2021
Author

Solution 3:

Construct the pretrained models using torch.nn.Module and pretrain them in LightningModule. Then, pass the pretrained models to the Ensemble module in torch.nn.Module form.

It seems that self.savehyperparameters() works when passing entire models as torch.nn.Module, but not as LightningModule.

Code (you can copy paste to run it):

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset

class MyModelA(torch.nn.Module):
    # For construct
    def __init__(self, hidden_dim):
        super(MyModelA, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
    
    def forward(self, x):
        x = self.fc1(x)
        return x

class LitModuleA(pl.LightningModule):
    # For training
    def __init__(self, hidden_dim = 10):
        super(LitModuleA, self).__init__()
        self.model = MyModelA(hidden_dim)
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x = self.model(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)

class MyModelB(torch.nn.Module):
    # For construct
    def __init__(self, hidden_dim):
        super(MyModelB, self).__init__()
        self.fc1 = torch.nn.Linear(hidden_dim, 2)
    
    def forward(self, x):
        x = self.fc1(x)
        return x

class LitModuleB(pl.LightningModule):
    # For training
    def __init__(self, hidden_dim = 10):
        super(LitModuleB, self).__init__()
        self.model = MyModelB(hidden_dim)
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x = self.model(x)
        return x

    def training_step(self, batch, batch_idx):
        x,y = batch
        return F.mse_loss(self.forward(x), y)

class LitEnsemble(pl.LightningModule):
    def __init__(self, modelA, modelB):
        super(LitEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB

        # Since modelA and modelB are in pytorch form
        # Then we can't use LightningModule.freeze to freeze their weights
        # Instead use pytorch's `requires_grad`
        for param in self.modelA.parameters():
            param.requires_grad = False
        for param in self.modelB.parameters():
            param.requires_grad = False

        self.classifier = torch.nn.Linear(4, 2)
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr = 1e-3)
        return optimizer
        
    def forward(self, x):
        x1 = self.modelA(x)
        x2 = self.modelB(x)
        x = torch.cat((x1, x2), dim=1)
        x = self.classifier(x)
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        return F.mse_loss(self.forward(x), y)

dl = DataLoader(TensorDataset(torch.randn(1000, 10), 
                            torch.randn(1000, 2)), 
                batch_size = 10)

litModuleA = LitModuleA(10)
litModuleB = LitModuleB(10)

# pretrained models in litModuleA and litModuleB
trainerA = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerA.fit(litModuleA, dl)
trainerB = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainerB.fit(litModuleB, dl)

# models in litModuleA and litModuleB contain pretrained weights
model = LitEnsemble(litModuleA.model, litModuleB.model)

trainer = pl.Trainer(gpus = 0, max_epochs = 5, progress_bar_refresh_rate = 50)
trainer.fit(model, dl)

# print ensemble module parameters to show pretrain models' weights are frozen
for name, param in model.named_parameters():
    print(name, param)

The cons of this solution:

Pretrained models lost their lightning functionality in the Ensemble module.

kyleong · 2021-05-09T03:51:40Z

kyleong
May 9, 2021
Author

I have finally came out with the final solution which can be obtained here.

Thank you for anyone who read and participate in this discussion.

1 reply

galthran-wq Nov 19, 2023

The best solution so far. However, it has a problem: you must know which class you need to instantiate exactly. It looses polymorhpism. As far as I understand, passing "custom" objects to LightningModel is still in conflict with save_hyperparameters and load_from_checkpoint functionality

anoopkdcs · 2023-04-18T08:27:52Z

anoopkdcs
Apr 18, 2023

How to add two train datasets and validation sets to the mode through the trainer.fit() function

trainer.fit(model, train_dataloaders = [dl_train1,dl_train2], val_dataloaders = [dl_val1, dl_val2])

Also how to modify the forward() function to process the two train and validation sets like

 def forward(self, x1, x2):
        m1 = self.modelA(x1)
        m2 = self.modelB(x2)
        x = torch.cat((m1, m2), dim=1)
        x = self.classifier(x)
        return x

Please Help. Thanks in advance!

1 reply

kaleynguyen May 7, 2023

Write the custom data module through LightningDataModule, define it in the train_dataloader, test_dataloader or val_dataloader. Check the resource of custom LightningDataModule on the PyTorch docs.

How to combine multiple lightning module and save hyperparameters #7249

Uh oh!

Uh oh!

kyleong Apr 28, 2021

Replies: 4 comments · 7 replies

Uh oh!

awaelchli Apr 28, 2021

Uh oh!

Uh oh!

kyleong Apr 28, 2021 Author

Uh oh!

awaelchli Apr 30, 2021

Uh oh!

Uh oh!

kyleong Apr 29, 2021 Author

Uh oh!

Uh oh!

kyleong Apr 29, 2021 Author

Uh oh!

Uh oh!

kyleong Apr 29, 2021 Author

Uh oh!

kyleong Apr 29, 2021 Author

Uh oh!

Uh oh!

kyleong May 9, 2021 Author

Uh oh!

galthran-wq Nov 19, 2023

Uh oh!

anoopkdcs Apr 18, 2023

Uh oh!

kaleynguyen May 7, 2023

kyleong
Apr 28, 2021

Replies: 4 comments 7 replies

awaelchli
Apr 28, 2021

kyleong Apr 28, 2021
Author

kyleong
Apr 29, 2021
Author

kyleong Apr 29, 2021
Author

kyleong Apr 29, 2021
Author

kyleong Apr 29, 2021
Author

kyleong
May 9, 2021
Author

anoopkdcs
Apr 18, 2023