-
Couldn't load subscription status.
- Fork 57
Open
Description
Thank you for sharing your expertise. Could you please explain why in our DDPG implementation, the cumulative reward exhibits an initial decline followed by a sustained increase during training, while simultaneously the State of Charge (SOC) converges to negative values?However, this issue does not manifest in DQN implementations.
Metadata
Metadata
Assignees
Labels
No labels