Skip to content
This repository was archived by the owner on Jan 15, 2026. It is now read-only.

Latest commit

 

History

History
17 lines (13 loc) · 961 Bytes

File metadata and controls

17 lines (13 loc) · 961 Bytes

Behavioural Cloning

  • Just takes the dataset generated by the expert policy and fits it to the learner policy
  • Problem: The learner policy might encounter states not in the dataset by the expert policy, leading to a compounded error problem
  • Solution: DAgger

DAgger

image
  • Init dataset

  • image

    First we start with expert dataset and train learner policy with only dataset by expert policy Slowly we shift to taking samples from learner policy. That's the reason we have beta, it tends to zero over time

  • Get states from the dataset visited by the "beta policy", but actions for them from the expert to avoid the compounded error problem

  • Add them to the dataset and train the learner policy