About ddpg

Thank you for sharing your expertise. Could you please explain why in our DDPG implementation, the cumulative reward exhibits an initial decline followed by a sustained increase during training, while simultaneously the State of Charge (SOC) converges to negative values?However, this issue does not manifest in DQN implementations.