Skip to content

[FSDP2, Do not merge] Refactor #3585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

[FSDP2, Do not merge] Refactor #3585

wants to merge 9 commits into from

Conversation

S1ro1
Copy link
Member

@S1ro1 S1ro1 commented May 22, 2025

A somewhat big PR, started as enabling fp8 with torchao, ended up realising fsdp2 is too b*tchy about order of operations, so went for a bigger refactor and few changes, going over them biggest -> smallest

  1. Move whole FSDP2 preparation to accelerator._prepare_fsdp2 such as in other cases as deepspeed - this enables bigger freedom in the future, when composing FSDP2 with other features. This has proved to be beneficial already where this enabled torchao fp8 support

  2. FSDP2 specific compile - we now compile after model converters (AC/FP8) but before FSDP2, this fixes compile issues and is actually how torchtitan does it

  3. Move optimiser parameter switch to be FSDP2 specific - this can now reside in the method mentioned above, enabling us to do simpler fsdp2 specific things. Now we canonicalise names for old and new params (not 100% sure about the reach of this, but proved to be ok with compile + AC + fp8)

  4. Closely related to 2, we now do a bit different order of operations, i.e. applying activation checkpointing before compile, that before fully_shard

  5. By accident, this PR also fixes FP8 with torchao, so the changes are included in there (should be minor)

I have been extensively testing this for a ~week, so I'm quite confident, though you never know. If @winglian had any time to test if this doesn't break anything downstream, would very much appreciate it.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@S1ro1 S1ro1 changed the title [FSDP2, Do not merge] FP8 adjustments [FSDP2, Do not merge] Refactor May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants