Skip to content

Bug: noisy-net layer #97

@mysl

Description

@mysl

hi @Kismuz
I was reading the paper "Noisy Network for exploration". And have a question w.r.t its usage in btgym. The paper says that "As A3C is an on-policy algorithm the gradients are unbiased when noise of the network is consistent for the whole roll-out. Consistency among action value functions is ensured by letting the noise be the same throughout each rollout"

It looks to me that in current implementation in btgym, it can't ensure "the noise is the same throughout each rollout", because the training steps and environment steps are executed in different threads, and could be interleaved. Or do I miss anythong? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions