Skip to content

Conversation

@S1ro1
Copy link
Contributor

@S1ro1 S1ro1 commented Apr 10, 2025

Reopening this: managed working TP + FSDP2 @ 2a13375. Run accelerate launch examples/fsdp2/fsdp2_tp.py --apply-tp --apply-fsdp on 8 GPUs (runs TP2 FSDP4) and can be compared with CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch examples/fsdp2/fsdp2_tp.py --apply-fsdp with vanila FSDP2. Needs to be rechecked for some hanging memory though, as the memory seems to not fix.

Will further refactor to enable easier composition with FSDP2 and other parallelisms (namely refactor of #3604 to have a better api).

Will continuosly rely on this transformers pr.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@S1ro1
Copy link
Contributor Author

S1ro1 commented Apr 11, 2025

Managed to make a minimal repro of composable tp+fsdp2 working, though it requires nightly torch 🚀

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@kmehant
Copy link
Contributor

kmehant commented May 12, 2025

pulse ping

@S1ro1
Copy link
Contributor Author

S1ro1 commented May 12, 2025

@kmehant we have internally decided to have as much logic as possible in transformers, so this is postponed until that is ~resolved. You can watch that here

@kmehant
Copy link
Contributor

kmehant commented May 12, 2025

I see @S1ro1 !

@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jun 14, 2025
@SunMarc SunMarc reopened this Jun 16, 2025
@SunMarc SunMarc added the wip Work in progress label Jun 16, 2025
@S1ro1
Copy link
Contributor Author

S1ro1 commented Jun 20, 2025

This is maybe gonna be a mess, I started to play and had success. At 2a13375 we have a working fsdp2+tp example, gonna try to clean this up a bit so we don't have CP related changes. This will also include the design for FSDP2 + {any}Parallelism that will then be used in CP PR. cc @SunMarc

@S1ro1 S1ro1 changed the title WIP: Compose TP + DDP/FSDP2 WIP: Compose TP + FSDP2 Jun 20, 2025
@S1ro1
Copy link
Contributor Author

S1ro1 commented Jul 22, 2025

Also superseeded by #3682

@S1ro1 S1ro1 closed this Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wip Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants