How to use a relatively large batch size under the limitation of GPU memory? #16377

koking0 · 2023-01-16T09:49:35Z

koking0
Jan 16, 2023

I am trying to fine-tune a GPT2 model with 3.5 billion parameters on 8 blocks of 1080Ti through distributed distribution. After a long time of research, I successfully run it through PyTorch Lighting + transformers + Deepspeed, but the batch size can only be set to 1.

I saw the following part of the code, which can simulate the effect of a relatively large batch size by accumulating gradients, but it is implemented based on PyTorch.

for i, (features, target) in enumerate(dataloader):
    # Forward pass
    output = model(features)
    loss = criterion(output, target)    
    
    # Backward pass
    loss.backward()    
    
    # Only update weights every other 2 iterations
    # Effective batch size is doubled
    if (i+1) % 2 == 0 or (i+1) == len(dataloader):
        # Update weights
        optimizer.step()        
        # Reset the gradients to None
        optimizer.zero_grad(set_to_none=True)

I know that in PyTorch Lightning, the training_step function is used to perform each step of training, but I did not find the specific implementation of this function in module.py.

May I ask whether it is possible to simulate a large batch size in the above way in PyTorch Lightning?

w2ex · 2023-01-16T16:55:50Z

w2ex
Jan 16, 2023

You can do this simply through the accumulate_grad_batches of your trainer :
trainer = pl.Trainer(accumulate_grad_batches=n, ...)
Hope this helps.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use a relatively large batch size under the limitation of GPU memory? #16377

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to use a relatively large batch size under the limitation of GPU memory? #16377

Uh oh!

Uh oh!

koking0 Jan 16, 2023

Replies: 1 comment

Uh oh!

w2ex Jan 16, 2023

koking0
Jan 16, 2023

w2ex
Jan 16, 2023