How to set different `strategy` for different parts of the model in a trainer? #17284

meneshail · 2023-04-05T15:20:21Z

meneshail
Apr 5, 2023

I have a model with the following architecture.

class Model(nn.Module):
    def __init__(self):
        self.model_1 = Model_1()   # need to be frozen
        self.model_2 = Model_2()   # need gradient checkpointing

    def forward(batch):
        x = self.model_1(batch)
        x = self.model_2(x)
        return x

I want to train the model with DDP. But the tricky thing is that the model's first part needs to be frozen and has no gradient, so I should use DDP with find_unused_parameters=True. While the second part uses gradient checkpointing, so I need to use DDP with find_unused_parameters=False.

Luckily, as is discussed in this post (Finding the cause of RuntimeError: Expected to mark a variable ready only once), I could use different DDP strategy for these two parts of the model to let each part work as expected.

model_1 = DDP(model_1, find_unused_parameters=True, device_ids=[rank])
model_2 = DDP(model_2, find_unused_parameters=False, device_ids=[rank])

However, since the model is further wrapped in a pl.LightningModule, usually I can only set one strategy and pass it to the pl.Trainer. I wonder is there a way to adapt my code so that I could use different strategies for different parts of the model, and train them together?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to set different `strategy` for different parts of the model in a trainer? #17284

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to set different strategy for different parts of the model in a trainer? #17284

Uh oh!

Uh oh!

meneshail Apr 5, 2023

Replies: 0 comments

How to set different `strategy` for different parts of the model in a trainer? #17284

meneshail
Apr 5, 2023