Replies: 1 comment
-
You can do this simply through the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to fine-tune a GPT2 model with 3.5 billion parameters on 8 blocks of 1080Ti through distributed distribution. After a long time of research, I successfully run it through PyTorch Lighting + transformers + Deepspeed, but the batch size can only be set to 1.
I saw the following part of the code, which can simulate the effect of a relatively large batch size by accumulating gradients, but it is implemented based on PyTorch.
I know that in PyTorch Lightning, the training_step function is used to perform each step of training, but I did not find the specific implementation of this function in module.py.
May I ask whether it is possible to simulate a large batch size in the above way in PyTorch Lightning?
Beta Was this translation helpful? Give feedback.
All reactions