Bug: noisy-net layer

hi @Kismuz 
I was reading the paper "Noisy Network for exploration".  And have a question w.r.t its usage in btgym.  The paper says that "As A3C is an on-policy algorithm the gradients are unbiased when noise of the network is consistent for the whole roll-out. Consistency among action value functions is ensured by letting the noise be the same throughout each rollout"

It looks to me that in current implementation in btgym,  it can't ensure "the noise is the same throughout each rollout", because the training steps and environment steps are executed in different threads, and could be interleaved.  Or do I miss anythong?  Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: noisy-net layer #97

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: noisy-net layer #97

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions