fsdp2 example unsharded_param.grad zero

Your issue may already be reported!
Please search on the [issue tracker](https://github.com/pytorch/examples/issues) before creating one.

## Context


* Pytorch version: 2.6.0
* Operating System and version: linux

## Your Environment

* Installed using source? [yes/no]: yes
* Are you planning to deploy it using docker container? [yes/no]:no
* Is it a CPU or GPU environment?:gpu
* Which example are you using: fsdp2/examples/distributed/FSDP2/train.py
* Link to code or data to repro [if any]:no

## Expected Behavior

(1) normal loss like 1.95 1.86 1.73 etc.
(2) unsharded_param.gard of module is not zero 

## Current Behavior

(1) abnormal loss  -13857836160.0, -15615669120.0 , -17379222400.0
(2) unsharded_param.gard of module is zero in every layer when i user logger to debug.

## Possible Solution


## Steps to Reproduce


1.run case fsdp2/examples/distributed/FSDP2/train.py
2.
...

## Failure Logs [if any]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fsdp2 example unsharded_param.grad zero #1377

Context

Your Environment

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Failure Logs [if any]

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fsdp2 example unsharded_param.grad zero #1377

Description

Context

Your Environment

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Failure Logs [if any]

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions