-
Notifications
You must be signed in to change notification settings - Fork 1.3k
WIP: Compose TP + FSDP2 #3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Compose TP + FSDP2 #3498
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Managed to make a minimal repro of composable tp+fsdp2 working, though it requires nightly torch 🚀 |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
pulse ping |
|
I see @S1ro1 ! |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Also superseeded by #3682 |
Reopening this: managed working TP + FSDP2 @ 2a13375. Run
accelerate launch examples/fsdp2/fsdp2_tp.py --apply-tp --apply-fsdpon 8 GPUs (runs TP2 FSDP4) and can be compared withCUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch examples/fsdp2/fsdp2_tp.py --apply-fsdpwith vanila FSDP2. Needs to be rechecked for some hanging memory though, as the memory seems to not fix.Will further refactor to enable easier composition with FSDP2 and other parallelisms (namely refactor of #3604 to have a better api).
Will continuosly rely on this transformers pr.