Support TP + FSDPv2 / HSDP or just FSDPv2 / HSDP #3395

kmehant · 2025-02-13T18:46:43Z

What does this PR do?

This PR proposes having FSDP2 as a separate distributed type for accelerate residing along with the FSDPv1 implementation.
Furthermore, the PR also proposes use of prepare_nd_device_mesh util function to extend creation of device meshes for any combination of parallelisms. Currently it supports any combination of TP and FSDP/HSDP
This PR should potentially supersede [RFC] Support FSDP2 #3231

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SunMarc @muellerzr
@kwen2501 from PyTorch

Signed-off-by: Mehant Kammakomati <[email protected]>

github-actions · 2025-03-16T15:06:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

kmehant · 2025-03-16T15:18:18Z

pulse

Signed-off-by: Mehant Kammakomati <[email protected]>

SunMarc · 2025-03-24T15:20:50Z

cc @S1ro1

S1ro1 · 2025-03-24T15:58:06Z

FSDP2 is close to done in #3394. Then I'll take a look at supporting HSDP and including TP. Though this PR probably also gets into the issue of increased memory usage for FSDP2 because of creating optimizer on a full (non-sharded) model, discussed here

@kmehant let me know if this works for your PR as expected, but it shouldn't.

model = ...
optimizer = ...(model.parameters(),...)

model, optimizer = accelerate.prepare(model, optimizer)

This should result in a higher memory usage as the optimizer holds the original model parameters.

github-actions · 2025-04-19T15:06:29Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

kmehant · 2025-04-28T04:18:07Z

Being tracked at #3498 :)

kmehant added 6 commits February 13, 2025 19:15

feat: support fsdpv2 and fsdpv2 + tp

a6109e3

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: support fsdpv2 and fsdpv2 + tp

3a3f035

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: support fsdpv2 and fsdpv2 + tp

18d2484

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: support fsdpv2 and fsdpv2 + tp

e3d18e9

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: support fsdpv2 and fsdpv2 + tp

3109de6

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: support fsdpv2 and fsdpv2 + tp

cb7696f

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant mentioned this pull request Feb 13, 2025

Initial FSDP2 support #3394

Merged

kmehant mentioned this pull request Feb 27, 2025

[RFC] Support FSDP2 #3231

Closed

5 tasks

feat: support fsdpv2 and fsdpv2 + tp

71271d1

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the tp-fsdp-2 branch from 38f8f8b to 71271d1 Compare March 20, 2025 10:04

github-actions bot closed this Apr 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support TP + FSDPv2 / HSDP or just FSDPv2 / HSDP #3395

Support TP + FSDPv2 / HSDP or just FSDPv2 / HSDP #3395

Uh oh!

kmehant commented Feb 13, 2025

Uh oh!

github-actions bot commented Mar 16, 2025

Uh oh!

kmehant commented Mar 16, 2025

Uh oh!

SunMarc commented Mar 24, 2025 •

edited

Loading

Uh oh!

S1ro1 commented Mar 24, 2025

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

kmehant commented Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support TP + FSDPv2 / HSDP or just FSDPv2 / HSDP #3395

Support TP + FSDPv2 / HSDP or just FSDPv2 / HSDP #3395

Uh oh!

Conversation

kmehant commented Feb 13, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Mar 16, 2025

Uh oh!

kmehant commented Mar 16, 2025

Uh oh!

SunMarc commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

S1ro1 commented Mar 24, 2025

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

kmehant commented Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SunMarc commented Mar 24, 2025 •

edited

Loading