Skip to content
This repository was archived by the owner on Jan 15, 2026. It is now read-only.

Latest commit

 

History

History
17 lines (14 loc) · 1.24 KB

File metadata and controls

17 lines (14 loc) · 1.24 KB

Delayed Q-learning

  • It’s a model-free, off-policy reinforcement learning algorithm designed to have probably approximately correct (PAC) guarantees.
  • Unlike standard Q-learning, it doesn’t update after every single step. Instead, it delays updates until it has enough evidence that a new estimate is significantly better than the old one.
  • This makes it more sample-efficient and statistically robust, avoiding noisy updates.

Opposition Learning

  • Opposition learning is the idea of simultaneously considering a guess and its “opposite” in the search space, to accelerate convergence toward the optimal solution
  • image

N-step Algorithm

image image image

Eligibility Traces

Watkins's Q(lambda)

Accumulating vs Replacing Eligibility Traces